Ready to write Python code 4x faster?
Merging dataframes allows you to combine multiple data sources into one. In the data world, enriching data often includes merging two datasets together. These datasets can be similar or different. Pandas is a great way to do this in a few lines of code. Below you’ll find easy “copy and paste -able” code to merge dataframes in seconds.
The code you probably want
Merge two datasets with an lookup join (similar to a VLOOKUP in Excel):
import pandas as pd
fav_color_df = pd.DataFrame({
"name": ["Jeff", "Bill", "Peter"],
"favorite_color": ["blue", "red", "yellow"]
})
fav_food_df = pd.DataFrame({
"name": ["Peter", "Jeff", "Bill", "Maggie"],
"favorite_food": ["apple", "banana", "orange", "tangerine"]
})
# First, drop duplicates so that we get the first entry for the lookup dataframe
fav_food_df.drop_duplicates(subset=["name"], inplace=True)
# Then, do the lookup merge
merged_df = fav_color_df.merge(fav_food_df, left_on=['name'], right_on=['name'], how='left')
Other Common Use Cases
Merge with an inner join.
An inner join will only keep the entries where the key exists in both datasets.
merged_df = fav_color_df.merge(fav_food_df, left_on=['name'], right_on=['name'], how='inner')
Merge with a left join
A left join will only keep the entries where the key exists in the first dataframe.
merged_df = fav_color_df.merge(fav_food_df, left_on=['name'], right_on=['name'], how='left')
Merge with a right join
A left join will only keep the entries where the key exists in the second dataframe.
merged_df = fav_color_df.merge(fav_food_df, left_on=['name'], right_on=['name'], how='right')
Merge with an outer join
Includes all rows from both dataframes.
merged_df = fav_color_df.merge(fav_food_df, left_on=['name'], right_on=['name'], how='outer')
Merge with an inner join
Includes rows that are present in both dataframes.
merged_df = fav_color_df.merge(fav_food_df, left_on=['name'], right_on=['name'], how='inner')
What is this blog post
At Mito, we’ve spent years working with users to automate their data work in Python. As such, we’ve got a great sense of what the most common use cases are for common pandas functions.
This blog post is part of a series of tutorials - designed to be the best pandas documentation for you most of the time. There’s no BS marketing language at the top of this blog post, and we rank (easy to copy!) code snippets depending on how likely you are to be encountering them.
If you’re looking to save time writing Python code, we recommend checking out Mito - we built it with the same set of learnings that helped us build the best pandas tutorials ever!
With Mito, you can add new columns and write formulas just like you would in Excel or Google sheets - and we write the Python code for you. See our documentation here.
More Like This
Automating Spreadsheets with Python 101
How to tell the difference between a good and bad Python automation target.
10 Mistakes To Look Out For When Transitioning from Excel To Python
10 Common Mistakes for new programmers transitioning from Excel to Python
Research shows Mito speeds up by 400%
We're always on the hunt for tools that improve our efficiency at work. Tools that let us accomplish more with less time, money, and resources.
3 Rules for Choosing Between SQL and Python
Analysts at the world's top banks are automating their manual Excel work so they can spend less time creating baseline reports, and more time building new analyses that push the company forward.