Quantcast
Channel: Planet Python
Viewing all articles
Browse latest Browse all 22871

Python for Beginners: Merge DataFrames in Python

$
0
0

Python provides us with the pandas dataframes to handle tabular data. In this article, we will discuss how we can merge two dataframes in python.

How to Merge Two DataFrames in Python?

Suppose that we have a dataframe that contains the names of some students, their roll number, and the class they have opted to study as shown below.

     Class  Roll      Name
0       1    11    Aditya
1       1    12     Chris
2       1    13       Sam
3       1    14      Joel
4       1    15       Tom
5       1    16  Samantha
6       1    17     Pablo
7       1    20      Tina
8       1    24       Amy
9       1    30    Justin
10      1    31      Karl

Also, we have a dataframe that contains the roll number of the students and the marks obtained by them as shown below.

    Roll  Marks Grade
0    11     85     A
1    12     95     A
2    13     75     B
3    14     75     B
4    16     78     B
5    15     55     C
6    19     75     B
7    20     72     B
8    24     92     A
9    25     95     A

Now, we have to merge the given dataframes such that the resultant dataframe contains the name, roll number, grade, class, and the corresponding marks for each student as shown below.

    Roll  Marks Grade  Class      Name
0    11     85     A      1    Aditya
1    12     95     A      1     Chris
2    13     75     B      1       Sam
3    14     75     B      1      Joel
4    16     78     B      1  Samantha
5    15     55     C      1       Tom
6    20     72     B      1      Tina
7    24     92     A      1       Amy

To obtain the output, we will use the merge() method defined in the pandas module. The syntax of the merge() method is as follows.

pd.merge(df1,df2, on) 

Here, 

  • df1  denotes the first dataframe.
  • The parameter df2 denotes the second dataframe that is to be merged.
  • The parameter ‘on’ is the column name of the dataframes that is used to compare the columns of the given dataframes. Rows from two dataframes are merged together if they have the same value in the column corresponding to the ‘on’ parameter.

The merge() method also takes several other parameters. You can have a look at them in this documentation.

To merge the given dataframes, we will invoke the merge() method first dataframe as the first input argument. Subsequently, we will pass the dataframe containing the marks as the second input argument to the merge() method. To the‘on’ parameter, we will pass the ‘Roll’ column name. In this way, the rows corresponding to the same roll numbers from the given dataframes will be merged and the resultant dataframe will be produced as shown below.

import numpy as np
import pandas as pd
df1=pd.read_csv("grade_with_roll1.csv")
print("First dataframe is:")
print(df1)
df2=pd.read_csv("name_data1.csv")
print("second dataframe is:")
print(df2)
df3=pd.merge(df1,df2,left_on="Roll", right_on="Roll")
print("Merged dataframe is:")
print(df3)

Output:

First dataframe is:
   Roll  Marks Grade
0    11     85     A
1    12     95     A
2    13     75     B
3    14     75     B
4    16     78     B
5    15     55     C
6    19     75     B
7    20     72     B
8    24     92     A
9    25     95     A
second dataframe is:
    Class  Roll      Name
0       1    11    Aditya
1       1    12     Chris
2       1    13       Sam
3       1    14      Joel
4       1    15       Tom
5       1    16  Samantha
6       1    17     Pablo
7       1    20      Tina
8       1    24       Amy
9       1    30    Justin
10      1    31      Karl
Merged dataframe is:
   Roll  Marks Grade  Class      Name
0    11     85     A      1    Aditya
1    12     95     A      1     Chris
2    13     75     B      1       Sam
3    14     75     B      1      Joel
4    16     78     B      1  Samantha
5    15     55     C      1       Tom
6    20     72     B      1      Tina
7    24     92     A      1       Amy

If the second dataframe has rows that do not correspond to any row in the first dataframe, those rows are omitted from the result. Similarly, if the first dataframe has rows that don’t correspond to the second dataframe, those rows will be omitted from the result. You can observe this in the above example.

Merge DataFrames Using Outer Join in Python

To include the omitted rows, we can use the parameter ‘how’ in the merge() method. The ‘how’ parameter has the default value ‘inner’. Due to this, only those rows are included in the resultant dataframe that are present in both the input dataframes. To included the omitted dataframes, you can pass the value ‘outer’ to the ‘how’ parameter as shown below.

import numpy as np
import pandas as pd
df1=pd.read_csv("grade_with_roll1.csv")
print("First dataframe is:")
print(df1)
df2=pd.read_csv("name_data1.csv")
print("second dataframe is:")
print(df2)
df3=pd.merge(df1,df2,how="outer",left_on="Roll", right_on="Roll",suffixes=("_left","_right"))
print("Merged dataframe is:")
print(df3)

Output:

First dataframe is:
   Roll  Marks Grade
0    11     85     A
1    12     95     A
2    13     75     B
3    14     75     B
4    16     78     B
5    15     55     C
6    19     75     B
7    20     72     B
8    24     92     A
9    25     95     A
second dataframe is:
    Class  Roll      Name
0       1    11    Aditya
1       1    12     Chris
2       1    13       Sam
3       1    14      Joel
4       1    15       Tom
5       1    16  Samantha
6       1    17     Pablo
7       1    20      Tina
8       1    24       Amy
9       1    30    Justin
10      1    31      Karl
Merged dataframe is:
    Roll  Marks Grade  Class      Name
0     11   85.0     A    1.0    Aditya
1     12   95.0     A    1.0     Chris
2     13   75.0     B    1.0       Sam
3     14   75.0     B    1.0      Joel
4     16   78.0     B    1.0  Samantha
5     15   55.0     C    1.0       Tom
6     19   75.0     B    NaN       NaN
7     20   72.0     B    1.0      Tina
8     24   92.0     A    1.0       Amy
9     25   95.0     A    NaN       NaN
10    17    NaN   NaN    1.0     Pablo
11    30    NaN   NaN    1.0    Justin
12    31    NaN   NaN    1.0      Karl

Suggested Reading: If you are into machine learning, you can read this article on regression in machine learning. You might also like this article on k-means clustering with numerical example.

Merge DataFrames Using Left and Right Join in Python

If you want to include omitted rows only from the first dataframe, you can pass the value ‘left’ to the parameter ‘how’.

import numpy as np
import pandas as pd
df1=pd.read_csv("grade_with_roll1.csv")
print("First dataframe is:")
print(df1)
df2=pd.read_csv("name_data1.csv")
print("second dataframe is:")
print(df2)
df3=pd.merge(df1,df2,how="left",left_on="Roll", right_on="Roll",suffixes=("_left","_right"))
print("Merged dataframe is:")
print(df3)

Output:

First dataframe is:
   Roll  Marks Grade
0    11     85     A
1    12     95     A
2    13     75     B
3    14     75     B
4    16     78     B
5    15     55     C
6    19     75     B
7    20     72     B
8    24     92     A
9    25     95     A
second dataframe is:
    Class  Roll      Name
0       1    11    Aditya
1       1    12     Chris
2       1    13       Sam
3       1    14      Joel
4       1    15       Tom
5       1    16  Samantha
6       1    17     Pablo
7       1    20      Tina
8       1    24       Amy
9       1    30    Justin
10      1    31      Karl
Merged dataframe is:
   Roll  Marks Grade  Class      Name
0    11     85     A    1.0    Aditya
1    12     95     A    1.0     Chris
2    13     75     B    1.0       Sam
3    14     75     B    1.0      Joel
4    16     78     B    1.0  Samantha
5    15     55     C    1.0       Tom
6    19     75     B    NaN       NaN
7    20     72     B    1.0      Tina
8    24     92     A    1.0       Amy
9    25     95     A    NaN       NaN

Similarly, if you want to include omitted rows only from the second dataframe, you can pass the value ‘right’ to the ‘how’ parameter as shown below.

import numpy as np
import pandas as pd
df1=pd.read_csv("grade_with_roll1.csv")
print("First dataframe is:")
print(df1)
df2=pd.read_csv("name_data1.csv")
print("second dataframe is:")
print(df2)
df3=pd.merge(df1,df2,how="right",left_on="Roll", right_on="Roll",suffixes=("_left","_right"))
print("Merged dataframe is:")
print(df3)

Output:

First dataframe is:
   Roll  Marks Grade
0    11     85     A
1    12     95     A
2    13     75     B
3    14     75     B
4    16     78     B
5    15     55     C
6    19     75     B
7    20     72     B
8    24     92     A
9    25     95     A
second dataframe is:
    Class  Roll      Name
0       1    11    Aditya
1       1    12     Chris
2       1    13       Sam
3       1    14      Joel
4       1    15       Tom
5       1    16  Samantha
6       1    17     Pablo
7       1    20      Tina
8       1    24       Amy
9       1    30    Justin
10      1    31      Karl
Merged dataframe is:
    Roll  Marks Grade  Class      Name
0     11   85.0     A      1    Aditya
1     12   95.0     A      1     Chris
2     13   75.0     B      1       Sam
3     14   75.0     B      1      Joel
4     15   55.0     C      1       Tom
5     16   78.0     B      1  Samantha
6     17    NaN   NaN      1     Pablo
7     20   72.0     B      1      Tina
8     24   92.0     A      1       Amy
9     30    NaN   NaN      1    Justin
10    31    NaN   NaN      1      Karl

Conclusion

In this article, we have discussed how to merge two dataframes in python using the merge() method. To learn more about python programming, you can read this article on dictionary comprehension in python. You might also like this article on list comprehension in python.

The post Merge DataFrames in Python appeared first on PythonForBeginners.com.


Viewing all articles
Browse latest Browse all 22871

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>