Functions
Math
MEDIAN
How to Use Excel's MEDIAN Function in Pandas
Excel's MEDIAN function calculates the median of a range of values. This guide explains how to replicate Excel's MEDIAN functionality in Python using pandas.
Implementing the Median function in Pandas#
To mimic the MEDIAN function from Excel in pandas, there are several approaches depending on your specific needs. Here are some common implementations:
Median value in each row#
In Excel, to median values in cells A1, B1, and C1, you'd use =MEDIAN(A1, B1, C1).
In pandas, finding the median of values across columns for a given row can be done similarly. Below is how you'd compute the median of 'Col1', 'Col2', and 'Col3' for each row:
# Calculate the median of Col1, Col2, Col3
df['Median'] = df[['Col1', 'Col2', 'Col3']].median(axis=1)
Entire column median#
In Excel, to find the median of an entire column, you'd use =MEDIAN(A:A).
In pandas, you can use the median method on the desired column to get a similar result:
# Calculate the median of the entire column
col_median = df['Column1'].median()
Entire dataframe median#
In Excel, to find the median of an entire table, you'd use =MEDIAN(A1:D10).
In pandas, you can use the median method on the entire dataframe to get a similar result:
# Flatten the DataFrame and get the median value
df_values = df.values.flatten().tolist()
median_value = pd.Series(df_values).median()
Common mistakes when using MEDIAN in Python#
These are common mistakes (and solutions) that you might run into while calculating medians in Python.
Finding the Median of Strings#
Attempting to calculate the median on columns with string values raises a TypeError. For example: `TypeError: could not convert string to float`. You must convert these strings to numeric types before computing the median.
For example, if you have values with the format $1.99, $2.99, etc. you must convert them to float values before calculating the median
# Convert $ values to float
df['Column'] = df['Column'].str.replace('$', '').astype(float)
# Calculate the median
median = df['Column'].median()
Ignoring Missing Values#
Not managing `NaN` values correctly can lead to inaccurate averages. By default, pandas skips `NaN` values, similar to how Excel's MEDIAN function disregards blank cells. However, you can also choose to treat `NaN` values as 0 when calculating the median, or return NaN if any of the values are NaN.
# Create a sample dataframe
df = pd.DataFrame({'Column': [1, 2, 3, None]})
# Calculate median with NaN values
df['Column'].median(skipna=False) # Resut: NaN
# Calculate median without NaN values
df['Column'].median(skipna=True) # Result: 2.0
# Calculate median with NaN values treated as 0
df['Column'].fillna(0).median() # Result: 1.5
Understanding the Median Formula in Excel#
The MEDIAN function in Excel takes one or more arguments (ranges of numbers) and returns their median.
=MEDIAN(number1, [number2], ...)
MEDIAN Excel Syntax
Parameter | Description | Data Type |
---|---|---|
number1 | The first number you want to include in the median. | number |
number2 | (Optional) Additional numbers you want to include in the median. | number |
... | (Optional) Add up to 255 numbers you want to include in the median. | number |
Examples
Formula | Description | Result |
---|---|---|
=MEDIAN(1, 2, 3) | Calculate the median of values 1, 2, and 3. | 2 |
=MEDIAN(A1:A10) | Calculate the median of values from A1 to A10. | Median of A1 to A10 |
Don't re-invent the wheel. Use Excel formulas in Python.
Install MitoDon't want to re-implement Excel's functionality in Python?
Edit a spreadsheet.
Generate Python.
Mito is the easiest way to write Excel formulas in Python. Every edit you make in the Mito spreadsheet is automatically converted to Python code.
View all 100+ transformations →