Transformations
Sort
How to Use Excel's Sort in Pandas
Excel's sorting functionality is pivotal for organizing data in a DataFrame. From arranging sales data by dates to sorting customer information alphabetically, SORT is fundamental in data analysis and reporting.
This page explains how to use Excel's sorting features in Python using pandas.
Implementing Sort in Pandas#
To implement Excel's sorting functionality in Python, you'll need to use the sort_values function in pandas. Here are some common use cases.
Using pandas sort_values#
The sort_values method in pandas is the equivalent of Excel's basic column sorting. It allows you to sort a DataFrame based on one or more columns.
In Excel, you might click a column header and choose 'Sort A to Z'. In pandas, you can use the following code.
Notice in the code below that we use the `inplace` parameter to modify the original DataFrame. We also use the `ignore_index` parameter to reset the index after sorting.
# Sort in ascending order
df.sort_values(by='Column1', ascending=True, inplace=True, ignore_index=True)
Sort Multiple Columns#
Sorting by multiple columns is similar to sorting by a single column. The only difference is that you need to pass a list of columns to the by parameter.
# Sort Column1 in ascending order, then Column2 in descending order
df.sort_values(by=['Column1', 'Column2'], ascending=[True, False], inplace=True, ignore_index=True)
Sort by Date#
Similar to Excel's date sorting, pandas requires the date column in a datetime format for accurate sorting.
# Convert Column1 to datetime format, then sort
df['Column1'] = pd.to_datetime(df['Column1'])
df.sort_values(by='Column2', ascending=True, inplace=True, ignore_index=True)
Sort by Specific Categorical Order#
Sorting by a specific categorical order is a bit more complex. In pandas, this involves setting a categorical data type with an order, and then sorting.
# Set the order of the Day
df['Day'] = pd.Categorical(df['Day'], categories=['Mon', 'Tue', 'Wed', 'Thur', 'Fri', 'Sat', 'Sun'], ordered=True)
# Sort by Day of week
df.sort_values(by='Day', ascending=True, inplace=True, ignore_index=True)
Sort NaN Values#
Just like Excel, by default, pandas sorts NaN values last. If you want to sort NaN values first, you can use the na_position parameter.
# Sort in ascending order with NaN values at the beginning
df.sort_values(by='Column1', ascending=True, na_position='first', inplace=True, ignore_index=True)
Sort Series#
Sorting a Series (a single column in a DataFrame) is similar to sorting a full DataFrame. The method is the same but applied to a Series object.
series.sort_values(ascending=True, inplace=True, ignore_index=True)
Common mistakes when using Sort in Python#
While implementing the SORT function in pandas, these are common mistakes that can lead to incorrect results or errors. Here's how to avoid them.
Ignoring Data Types#
Sorting mixed data types can lead to unexpected behavior. Ensure columns have consistent data types before sorting.
In Excel, sorting handles different data types implicitly, but in pandas, it's crucial to be explicit.
# Convert Column1 to string, then sort
df['Column1'] = df['Column1'].astype(str)
# Sort by Column1
df.sort_values(by='Column1', ascending=True, inplace=True, ignore_index=True)
Forgetting to sort inplace#
By default, pandas' sort_values returns a new DataFrame. If you want to modify the original DataFrame, you need to set the inplace parameter to True or assign the result to a variable.
# Incorrect: Sort without inplace
df.sort_values(by='Column1', ascending=True, ignore_index=True)
# Correct: Sort with inplace
df.sort_values(by='Column1', ascending=True, inplace=True, ignore_index=True)
# Correct: Sort with inplace and assign to variable
df = df.sort_values(by='Column1', ascending=True, ignore_index=True)
NaN Values#
NaN values can affect sorting order. In pandas, you have the option to place NaN values at the beginning or end of the sorted series.
Excel's sorting treats blank cells differently than pandas' NaN values.
# Sort in ascending order with NaN values at the end
df.sort_values(by='Column1', na_position='last')
Don't re-invent the wheel. Use Excel formulas in Python.
Install MitoDon't want to re-implement Excel's functionality in Python?
Edit a spreadsheet.
Generate Python.
Mito is the easiest way to write Excel formulas in Python. Every edit you make in the Mito spreadsheet is automatically converted to Python code.
View all 100+ transformations →