## The Challenge: Mean, Median, Mode

From 10 Days of Statistics Day 0: Mean, Median, and Mode:

Output FormatPrint lines of output in the following order:

- Print the mean on a new line, to a scale of decimal place (i.e., , ).
- Print the median on a new line, to a scale of decimal place (i.e., , ).
- Print the mode on a new line; if more than one such value exists, print the numerically smallest one.

Sample Input`10 64630 11735 14216 99233 14470 4978 73429 38120 51135 67060`

Sample Output`43900.6 44627.5 4978`

The top-voted Python 3 solution came out to be:

Python 3 - Dont reinvent the wheel ;)`import numpy as np from scipy import stats size = int(input()) numbers = list(map(int, input().split())) print(np.mean(numbers)) print(np.median(numbers)) print(int(stats.mode(numbers)[0]))`

To those who have been introduced to Python via data science courses and tools, this may seem like the solution one is looking for. Though, this is *only* the case if a project already includes the SciPy package.

### Wait, Why Could This Be Bad Practice?

The **scipy** and **numpy** packages are third-party libraries, and they would have to be added to a `requirements.txt`

, `setup.py`

, or `Pipfile`

in order to make use of them in a project. This adds complexity by piling onto the software supply chain.fn1

Installing **scipy** (which includes installing **numpy** as a dependency) results in:

- Downloading ~45mb worth of files: >3000 files
- Introducing potential for vulnerabilities in a project

Just this year, **numpy** had an Arbitrary Code Execution (ACE) vulnerability raised around how it was unpickling-by-default with `numpy.load`

, which has since changed. The **pickle** module is known for this vulnerability risk, and has a big red warning about it in the Python docs.fn2

Using these third-party packages is overkill for a project that doesn't already contain the libraries, unless you'd really like to be on the lookout for long GitHub Issue conversations and *Common Vulnerabilities and Exposures (CVE)* database entries (such as CVE-2019-6446 in this case) where you try to decipher how big a problem this is if it even is a problem at all.

## Using Standard Libraries

How can we solve this problem with standard libraries that come with Python?

```
# With standard lib imports only
from statistics import mean, median
def basicstats(numbers):
print(round(mean(numbers),1))
print(median(numbers))
print(max(sorted(numbers), key=numbers.count))
input() # Don't need array length, so ignore input
numbers = list(map(float, input().split()))
basicstats(numbers)
```

### Detailed Code Breakdown

```
from statistics import mean, median
```

`statistics`

has been included with Python 3 since Python 3.4 (released in 2014).- We only want
`mean`

and`median`

from this library, so we are explicitly importing each rather than importing the entire library. - Why aren't we using
`mode`

from`statistics`

? This is because`mode`

will error-out in cases where:*"...if there is not exactly***one**most common value,`StatisticsError`

is raised."fn3- This is a problem, due to the last requirement of the challenge for
**mode**output:*"...if more than one such [mode] value exists, print the numerically smallest one."*

- This is a problem, due to the last requirement of the challenge for

```
input() # Don't need array length, so ignore input
numbers = list(map(float, input().split()))
```

- We do nothing with the first
`input()`

, which is meant to be a count of numbers being input in the second prompt. This is dropped because it is not needed in order to produce the mean, median, and mode output. - For
`numbers`

, let's start from the inside-most parentheses and move outword:`input().split()`

breaks apart the single-string input into a list of strings, as`split()`

defaults to whitespace as the*sep*delimiter:*"If***sep**is not specified or is`None`

, a different splitting algorithm is applied: runs of consecutive whitespace are regarded as a single separator, and the result will contain no empty strings at the start or end if the string has leading or trailing whitespace. Consequently, splitting an empty string or a string consisting of just whitespace with a`None`

separator returns`[]`

."fn4`map(float, input().split())`

: Here,`map()`

is being used to convert the resulting list of strings into**float**type values.`list(map(...))`

: The reason we need to convert the*map*back into a*list*is because`map()`

returns an*iterator*. This means we can only call the elements within it*once*. If all we wanted was the median, for example, we wouldn't need to convert the*map*to a*list*type because we may not care about the values anymore after the median is returned.

NOTE:Instead of`list(map(...))`

, we could use alist comprehensionfn5 like so:`numbers = [float(number) for number in input().split()]`

This is argued as a better approach on StackOverflow,fn6 and if you are up for an interesting side note of history, you can read about how`map()`

was nearly removed from Python 3 at one point.fn7

After we have our list of floats, `basicstats(numbers)`

is called, running the following:

```
def basicstats(numbers):
print(round(mean(numbers),1))
print(median(numbers))
print(max(sorted(numbers), key=numbers.count))
```

`print(round(mean(numbers), 1))`

from the inside-most parentheses and move outword to see what we are printing out:`mean(numbers)`

: Simply returns the**mean**without a third-party package!`round(mean(numbers), 1)`

rounds the resulting float to one number after the decimal point (per requirements).

`print(median(numbers))`

: Simply returns the**median**without a third-party package!`print(max(sorted(numbers), key=numbers.count))`

: how is this providing the**mode**?`sorted(numbers)`

: First, we need the list sorted as we are only meant to return the lowest-value mode if their is more than one value. This is needed for`max(...)`

to properly return the lowest value we want.`max(sorted(numbers), key=numbers.count))`

: Providing`key=numbers.count`

as an arg is ensuring we get the value with the highest count within the list.`max()`

only returns a single value, so it will return the first value, being the lowest in the event that there is a draw (due to use using`sorted(numbers)`

).

`Counter()`

Optional Approach to Retrieving Mode: Using Instead of `max()`

, we could alternately use `Counter()`

fn8 from `collections`

, which is argued to be a better approach to this problem.fn9 **Counter()** was added to the **collections** module way back with Python 2.7.0 (released in 2010):

```
# With standard lib imports only
from statistics import mean, median
from collections import Counter
def basicstats(numbers):
print(round(mean(numbers),1))
print(median(numbers))
# Optional approach to 'mode'
print(Counter(sorted(numbers)).most_common(1)[0][0])
input() # Don't need array length, so ignore input
numbers = list(map(float, input().split()))
basicstats(numbers)
```

`Counter(sorted(numbers)).most_common(1)[0][0]`

working from the inside, out:`sorted(numbers)`

is needs for the later call of`most_common()`

to return the*lowest*mode.`Counter(...)`

: Creates a dictionary with count values of all elements in the list.`Counter(...).most_common(1)`

: Returns a*list*of*tuples*. Using`1`

as an arg means it returns only one*tuple*, being the*first*value that appears the most often.`Counter(...).most_common(1)[0][0]`

: The first`[0]`

means we are calling the*tuple*in the`0`

index position of the*list*, with the`[0]`

calling the`0`

index value of that*tuple*.

## Conclusion

There are many ways to come to a solution, and depending on the situation, some are better than others. If packages like **scipy** and/or **numpy** are already included within a project, it certainly makes sense to use them.

Though, it is a great idea to take a look at whether built-in or standard libraries can solve a problem before looking into third-party solutions. This helps you:

- Learn what Python is capable of out-of-the-box
- Make your code more portable for use in other projects without installing additional resources
- Reduce the security complexity of the software supply chainfn1 by avoiding unnecessary inclusion of third-party packages

## Footnotes

Software Supply Chain: Fewer, Better Suppliers. Written by Shannon Lietz @ DevSecOps, 2016˄

Comprehending Python's Comprehensions. Written by Dan Bader @ dbader.org˄

The Fate of

`reduce()`

in Python 3. Written by Guido van Rossum, 2005.*NOTE: He's the creator, and previous BDFL, of Python. The article includes thoughts on*˄**map()**,**filter()**, and**lambda**.StackOverflow: Python - Find The Item with Maximum Occurrences in A List.˄