HackerRank HackerRant - Mean, Median, and Mode in Python
HackerRank is an excellent website to create code based on prompt challenges, prepare for coding interviews, search for jobs, and to see how the community has approached the solutions over time. The author wanted to dive into the Python focused solutions, and is in no way affiliated with HackerRank itself.
The Challenge: Mean, Median, Mode
Print lines of output in the following order:
- Print the mean on a new line, to a scale of decimal place (i.e., , ).
- Print the median on a new line, to a scale of decimal place (i.e., , ).
- Print the mode on a new line; if more than one such value exists, print the numerically smallest one.
10 64630 11735 14216 99233 14470 4978 73429 38120 51135 67060
43900.6 44627.5 4978
The top-voted Python 3 solution came out to be:
Python 3 - Dont reinvent the wheel ;)
import numpy as np from scipy import stats size = int(input()) numbers = list(map(int, input().split())) print(np.mean(numbers)) print(np.median(numbers)) print(int(stats.mode(numbers)))
To those who have been introduced to Python via data science courses and tools, this may seem like the solution one is looking for. Though, this is only the case if a project already includes the SciPy package.
Wait, Why Could This Be Bad Practice?
The scipy and numpy packages are third-party libraries, and they would have to be added to a
Pipfile in order to make use of them in a project. This adds complexity by piling onto the software supply chain.fn1
Installing scipy (which includes installing numpy as a dependency) results in:
- Downloading ~45mb worth of files: >3000 files
- Introducing potential for vulnerabilities in a project
Just this year, numpy had an Arbitrary Code Execution (ACE) vulnerability raised around how it was unpickling-by-default with
numpy.load, which has since changed. The pickle module is known for this vulnerability risk, and has a big red warning about it in the Python docs.fn2
Using these third-party packages is overkill for a project that doesn't already contain the libraries, unless you'd really like to be on the lookout for long GitHub Issue conversations and Common Vulnerabilities and Exposures (CVE) database entries (such as CVE-2019-6446 in this case) where you try to decipher how big a problem this is if it even is a problem at all.
Using Standard Libraries
How can we solve this problem with standard libraries that come with Python?
# With standard lib imports only from statistics import mean, median def basicstats(numbers): print(round(mean(numbers),1)) print(median(numbers)) print(max(sorted(numbers), key=numbers.count)) input() # Don't need array length, so ignore input numbers = list(map(float, input().split())) basicstats(numbers)
Detailed Code Breakdown
from statistics import mean, median
statisticshas been included with Python 3 since Python 3.4 (released in 2014).
- We only want
medianfrom this library, so we are explicitly importing each rather than importing the entire library.
- Why aren't we using
statistics? This is because
modewill error-out in cases where: "...if there is not exactly one most common value,
- This is a problem, due to the last requirement of the challenge for mode output: "...if more than one such [mode] value exists, print the numerically smallest one."
input() # Don't need array length, so ignore input numbers = list(map(float, input().split()))
- We do nothing with the first
input(), which is meant to be a count of numbers being input in the second prompt. This is dropped because it is not needed in order to produce the mean, median, and mode output.
numbers, let's start from the inside-most parentheses and move outword:
input().split()breaks apart the single-string input into a list of strings, as
split()defaults to whitespace as the sep delimiter: "If sep is not specified or is
None, a different splitting algorithm is applied: runs of consecutive whitespace are regarded as a single separator, and the result will contain no empty strings at the start or end if the string has leading or trailing whitespace. Consequently, splitting an empty string or a string consisting of just whitespace with a
map(float, input().split()): Here,
map()is being used to convert the resulting list of strings into float type values.
list(map(...)): The reason we need to convert the map back into a list is because
map()returns an iterator. This means we can only call the elements within it once. If all we wanted was the median, for example, we wouldn't need to convert the map to a list type because we may not care about the values anymore after the median is returned.
NOTE: Instead of
list(map(...)), we could use a list comprehensionfn5 like so:
numbers = [float(number) for number in input().split()]
After we have our list of floats,
basicstats(numbers) is called, running the following:
def basicstats(numbers): print(round(mean(numbers),1)) print(median(numbers)) print(max(sorted(numbers), key=numbers.count))
print(round(mean(numbers), 1))from the inside-most parentheses and move outword to see what we are printing out:
mean(numbers): Simply returns the mean without a third-party package!
round(mean(numbers), 1)rounds the resulting float to one number after the decimal point (per requirements).
print(median(numbers)): Simply returns the median without a third-party package!
print(max(sorted(numbers), key=numbers.count)): how is this providing the mode?
sorted(numbers): First, we need the list sorted as we are only meant to return the lowest-value mode if their is more than one value. This is needed for
max(...)to properly return the lowest value we want.
max(sorted(numbers), key=numbers.count)): Providing
key=numbers.countas an arg is ensuring we get the value with the highest count within the list.
max()only returns a single value, so it will return the first value, being the lowest in the event that there is a draw (due to use using
Optional Approach to Retrieving Mode: Using
max(), we could alternately use
collections, which is argued to be a better approach to this problem.fn9 Counter() was added to the collections module way back with Python 2.7.0 (released in 2010):
# With standard lib imports only from statistics import mean, median from collections import Counter def basicstats(numbers): print(round(mean(numbers),1)) print(median(numbers)) # Optional approach to 'mode' print(Counter(sorted(numbers)).most_common(1)) input() # Don't need array length, so ignore input numbers = list(map(float, input().split())) basicstats(numbers)
Counter(sorted(numbers)).most_common(1)working from the inside, out:
sorted(numbers)is needs for the later call of
most_common()to return the lowest mode.
Counter(...): Creates a dictionary with count values of all elements in the list.
Counter(...).most_common(1): Returns a list of tuples. Using
1as an arg means it returns only one tuple, being the first value that appears the most often.
Counter(...).most_common(1): The first
means we are calling the tuple in the
0index position of the list, with the
0index value of that tuple.
There are many ways to come to a solution, and depending on the situation, some are better than others. If packages like scipy and/or numpy are already included within a project, it certainly makes sense to use them.
Though, it is a great idea to take a look at whether built-in or standard libraries can solve a problem before looking into third-party solutions. This helps you:
- Learn what Python is capable of out-of-the-box
- Make your code more portable for use in other projects without installing additional resources
- Reduce the security complexity of the software supply chainfn1 by avoiding unnecessary inclusion of third-party packages
Was this helpful? Have thoughts to add? Please add to the conversation on dev.to!