Quantcast
Channel: Planet Python
Viewing all articles
Browse latest Browse all 22875

Python for Beginners: Concatenate, Merge, and Join Pandas DataFrames

$
0
0

Pandas dataframes are the primary tools to analyze tabular data in python. In this article, we will discuss different ways to concatenate, merge, and join pandas dataframes using the merge() function, the join() function, and the concat() function.

The Pandas merge() Function

The merge() function is used to merge pandas dataframes in Python. The merge happens in a similar manner to the join operations in database columns. The syntax for the merge() function is as follows.

pandas.merge(left_df, right_df, how='inner', on=None, left_on=None, right_on=None, left_index=False, right_index=False, suffixes=('_x', '_y'),indicator=False)

Here, 

  • The left_df and right_df parameters take a dataframe each as their input argument. The position of the dataframe affects the output. Hence, we have explicitly named the positions of the dataframe. 
  • The “how” parameter is used to decide the join operation that will be performed on left_df and right_df. It has the default value “inner” which shows that the inner join operation will be performed. You can also use the literal “outer”, “left”, or “right” to perform full outer join, left join, or right join respectively on the dataframes. 
  • The on parameter is used to decide the column that is used as the key for the join operations. Here, the column name provided to the on parameter must be present in both left_df and right_df. If you are using different column names from both the dataframes as join keys, the on parameter is set to None by default.
  • The left_on parameter is used to specify the column name to be used as the join key from the left_df
  • The right_on parameter is used to specify the column name to be used as the join key from the right_df
  • If the input dataframe have any of the columns as indices, you can use the left_index and right_index parameter to use the indices as join keys.
  • The left_index parameter is set to True if you want to use the index of left_df as the join key. It has the default value of False.
  • The right_index parameter is set to True if you want to use the index of right_df as the join key. It has the default value of False.
  • The suffixes parameter comes into use when left_df and right_df have common column names. If left_df and right_df have common column names, _x is added to the respective column name in left_df and _y is added to the same column name in right_df. You can also specify suffixes manually using the suffixes parameter.
  • The indicator parameter is used to indicate if the join keys were present in the input dataframes. If the indicator is set to True, an additional column named “_merge” is added to the output dataframe.  For each row, if the join key were present in both left_df and right_df, the output is _merge column in both. If the join key were present only in the left_df or right_df, the value in the _merge column will be left_only or right_only respectively. 

After execution, the merge() function returns the output dataframe. 

Merge or Inner Join DataFrames Using the merge() Function

We can merge two dataframes using the merge() function. The merge functionally works as database join operation. The columns that are compared while joining the dataframes are passed to left_on and the right_on parameter.

After comparing the values in the left_on column in left dataframe and right_on column in the right dataframe, the rows are merged to produce the output dataframe.

import numpy as np
import pandas as pd
df1=pd.read_csv("grade_with_roll1.csv")
print("First dataframe is:")
print(df1)
df2=pd.read_csv("name_data1.csv")
print("second dataframe is:")
print(df2)
df3=pd.merge(df1,df2,left_on="Roll", right_on="Roll")
print("Merged dataframe is:")
print(df3)

Output:

First dataframe is:
   Roll  Marks Grade
0    11     85     A
1    12     95     A
2    13     75     B
3    14     75     B
4    16     78     B
5    15     55     C
6    19     75     B
7    20     72     B
8    24     92     A
9    25     95     A
second dataframe is:
    Class  Roll      Name
0       1    11    Aditya
1       1    12     Chris
2       1    13       Sam
3       1    14      Joel
4       1    15       Tom
5       1    16  Samantha
6       1    17     Pablo
7       1    20      Tina
8       1    24       Amy
9       1    30    Justin
10      1    31      Karl
Merged dataframe is:
   Roll  Marks Grade  Class      Name
0    11     85     A      1    Aditya
1    12     95     A      1     Chris
2    13     75     B      1       Sam
3    14     75     B      1      Joel
4    16     78     B      1  Samantha
5    15     55     C      1       Tom
6    20     72     B      1      Tina
7    24     92     A      1       Amy

In the above example, we have performed inner join on pandas dataframes using the "Roll" attribute. From both the dataframes, the rows having the same value for the "Roll" attribute are merged together to form the rows of the output dataframe.

If the input dataframes have common columns, the suffix _x and _y are added to the column names of left and right pandas dataframes respectively after the join operation. You can observe this in the following example.

import numpy as np
import pandas as pd
df1=pd.read_csv("grade_with_name.csv")
print("First dataframe is:")
print(df1)
df2=pd.read_csv("name_data1.csv")
print("second dataframe is:")
print(df2)
df3=pd.merge(df1,df2,left_on="Roll", right_on="Roll")
print("Merged dataframe is:")
print(df3)

Output:

First dataframe is:
   Roll      Name  Marks Grade
0    11    Aditya     85     A
1    12     Chris     95     A
2    13       Sam     75     B
3    14      Joel     75     B
4    16       Tom     78     B
5    15  Samantha     55     C
6    20      Tina     72     B
7    24       Amy     92     A
second dataframe is:
    Class  Roll      Name
0       1    11    Aditya
1       1    12     Chris
2       1    13       Sam
3       1    14      Joel
4       1    15       Tom
5       1    16  Samantha
6       1    17     Pablo
7       1    20      Tina
8       1    24       Amy
9       1    30    Justin
10      1    31      Karl
Merged dataframe is:
   Roll    Name_x  Marks Grade  Class    Name_y
0    11    Aditya     85     A      1    Aditya
1    12     Chris     95     A      1     Chris
2    13       Sam     75     B      1       Sam
3    14      Joel     75     B      1      Joel
4    16       Tom     78     B      1  Samantha
5    15  Samantha     55     C      1       Tom
6    20      Tina     72     B      1      Tina
7    24       Amy     92     A      1       Amy

In this example, both the input dataframes have the"Name” attribute. Hence, the Name column of the first dataframe gets _x suffix in the output dataframe. Similarly, the Name column of the second dataframe gets the suffix _y in the output dataframe.

Instead of the default suffix, we can also change the suffixes as shown below.

import numpy as np
import pandas as pd
df1=pd.read_csv("grade_with_name.csv")
print("First dataframe is:")
print(df1)
df2=pd.read_csv("name_data1.csv")
print("second dataframe is:")
print(df2)
df3=pd.merge(df1,df2,left_on="Roll", right_on="Roll",suffixes=("_left","_right"))
print("Merged dataframe is:")
print(df3)

Output:

First dataframe is:
   Roll      Name  Marks Grade
0    11    Aditya     85     A
1    12     Chris     95     A
2    13       Sam     75     B
3    14      Joel     75     B
4    16       Tom     78     B
5    15  Samantha     55     C
6    20      Tina     72     B
7    24       Amy     92     A
second dataframe is:
    Class  Roll      Name
0       1    11    Aditya
1       1    12     Chris
2       1    13       Sam
3       1    14      Joel
4       1    15       Tom
5       1    16  Samantha
6       1    17     Pablo
7       1    20      Tina
8       1    24       Amy
9       1    30    Justin
10      1    31      Karl
Merged dataframe is:
   Roll Name_left  Marks Grade  Class Name_right
0    11    Aditya     85     A      1     Aditya
1    12     Chris     95     A      1      Chris
2    13       Sam     75     B      1        Sam
3    14      Joel     75     B      1       Joel
4    16       Tom     78     B      1   Samantha
5    15  Samantha     55     C      1        Tom
6    20      Tina     72     B      1       Tina
7    24       Amy     92     A      1        Amy

In the above example, we have used the _left suffix for the first dataframe and the _right suffix for the second dataframe. You can also change the suffixes using the "suffixes" parameter.

Left Join Pandas DataFrames Using the merge() Function

By default the merge() function performs inner join operation on the input dataframes. We can also merge dataframes using the left join operation. For this, we can pass the literal "left"to the how parameter.

When we perform left join operation on pandas dataframes, all the rows of the left dataframe are shown in the output dataframe. For the rows in the left dataframe that don’t have a corresponding row in the right dataframe, we get NaN values in the rows of the columns corresponding to the right dataframe.

import numpy as np
import pandas as pd
df1=pd.read_csv("grade_with_roll1.csv")
print("First dataframe is:")
print(df1)
df2=pd.read_csv("name_data1.csv")
print("second dataframe is:")
print(df2)
df3=pd.merge(df1,df2,how="left",left_on="Roll", right_on="Roll",suffixes=("_left","_right"))
print("Merged dataframe is:")
print(df3)

Output:

First dataframe is:
   Roll  Marks Grade
0    11     85     A
1    12     95     A
2    13     75     B
3    14     75     B
4    16     78     B
5    15     55     C
6    19     75     B
7    20     72     B
8    24     92     A
9    25     95     A
second dataframe is:
    Class  Roll      Name
0       1    11    Aditya
1       1    12     Chris
2       1    13       Sam
3       1    14      Joel
4       1    15       Tom
5       1    16  Samantha
6       1    17     Pablo
7       1    20      Tina
8       1    24       Amy
9       1    30    Justin
10      1    31      Karl
Merged dataframe is:
   Roll  Marks Grade  Class      Name
0    11     85     A    1.0    Aditya
1    12     95     A    1.0     Chris
2    13     75     B    1.0       Sam
3    14     75     B    1.0      Joel
4    16     78     B    1.0  Samantha
5    15     55     C    1.0       Tom
6    19     75     B    NaN       NaN
7    20     72     B    1.0      Tina
8    24     92     A    1.0       Amy
9    25     95     A    NaN       NaN

In the above example, we don’t have rows having Roll 19 and Roll 25 in the right dataframe. But, they exist in the left dataframe. Hence, after a left join operation, we get rows having Roll 19 and 25 in the output dataframe. However, the columns from right dataframe have NaN values in the respective rows for Roll 19 and 25.

Right Join Pandas DataFrames Using the merge() Function

Similar to the left join operation, we can also perform right join operation on pandas dataframes using the merge() function. We can merge dataframes using right join operation by passing the literal "right"to the "how" parameter.

When we perform right join operation on pandas dataframes, all the rows of the right dataframe are shown in the output dataframe. For the rows in the right dataframe that don’t have a corresponding row in the left dataframe, we get NaN values in the rows of the columns corresponding to the left dataframe.

import numpy as np
import pandas as pd
df1=pd.read_csv("grade_with_roll1.csv")
print("First dataframe is:")
print(df1)
df2=pd.read_csv("name_data1.csv")
print("second dataframe is:")
print(df2)
df3=pd.merge(df1,df2,how="right",left_on="Roll", right_on="Roll",suffixes=("_left","_right"))
print("Merged dataframe is:")
print(df3)

Output:

First dataframe is:
   Roll  Marks Grade
0    11     85     A
1    12     95     A
2    13     75     B
3    14     75     B
4    16     78     B
5    15     55     C
6    19     75     B
7    20     72     B
8    24     92     A
9    25     95     A
second dataframe is:
    Class  Roll      Name
0       1    11    Aditya
1       1    12     Chris
2       1    13       Sam
3       1    14      Joel
4       1    15       Tom
5       1    16  Samantha
6       1    17     Pablo
7       1    20      Tina
8       1    24       Amy
9       1    30    Justin
10      1    31      Karl
Merged dataframe is:
    Roll  Marks Grade  Class      Name
0     11   85.0     A      1    Aditya
1     12   95.0     A      1     Chris
2     13   75.0     B      1       Sam
3     14   75.0     B      1      Joel
4     15   55.0     C      1       Tom
5     16   78.0     B      1  Samantha
6     17    NaN   NaN      1     Pablo
7     20   72.0     B      1      Tina
8     24   92.0     A      1       Amy
9     30    NaN   NaN      1    Justin
10    31    NaN   NaN      1      Karl

In this example, we don’t have rows having Roll 17, 30 and 31 in the left dataframe. But, they exist in the right dataframe. Hence, after a right join operation, we get rows having Roll 17, 30 and 31 in the output dataframe. However, the columns from left dataframe have NaN values in the respective rows for Roll 17, 30 and 31.

Full Outer Join Pandas DataFrames Using the merge() Function

Inner join, left join, and right join operations on the pandas dataframes lead to loss in data. If you want to retain all the input data, you can perform full outer join on the pandas dataframes. For this, you need to pass the literal "outer" to the "how" parameter in the merge() function.

In the outer join operation, all the rows of the left dataframe are shown in the output dataframe. For the rows in the left dataframe that don’t have a corresponding row in the right dataframe, we get NaN values in the rows of the columns corresponding to the right dataframe.

Similarly, all the rows of the right dataframe are shown in the output dataframe. For the rows in the right dataframe that don’t have a corresponding row in the left dataframe, we get NaN values in the rows of the columns corresponding to the left dataframe.

import numpy as np
import pandas as pd
df1=pd.read_csv("grade_with_roll1.csv")
print("First dataframe is:")
print(df1)
df2=pd.read_csv("name_data1.csv")
print("second dataframe is:")
print(df2)
df3=pd.merge(df1,df2,how="outer",left_on="Roll", right_on="Roll",suffixes=("_left","_right"))
print("Merged dataframe is:")
print(df3)

Output:

First dataframe is:
   Roll  Marks Grade
0    11     85     A
1    12     95     A
2    13     75     B
3    14     75     B
4    16     78     B
5    15     55     C
6    19     75     B
7    20     72     B
8    24     92     A
9    25     95     A
second dataframe is:
    Class  Roll      Name
0       1    11    Aditya
1       1    12     Chris
2       1    13       Sam
3       1    14      Joel
4       1    15       Tom
5       1    16  Samantha
6       1    17     Pablo
7       1    20      Tina
8       1    24       Amy
9       1    30    Justin
10      1    31      Karl
Merged dataframe is:
    Roll  Marks Grade  Class      Name
0     11   85.0     A    1.0    Aditya
1     12   95.0     A    1.0     Chris
2     13   75.0     B    1.0       Sam
3     14   75.0     B    1.0      Joel
4     16   78.0     B    1.0  Samantha
5     15   55.0     C    1.0       Tom
6     19   75.0     B    NaN       NaN
7     20   72.0     B    1.0      Tina
8     24   92.0     A    1.0       Amy
9     25   95.0     A    NaN       NaN
10    17    NaN   NaN    1.0     Pablo
11    30    NaN   NaN    1.0    Justin
12    31    NaN   NaN    1.0      Karl

In the above example, we don’t have rows having Roll 19 and Roll 25 in the right dataframe. But, they exist in the left dataframe. Hence, after the outer join operation, we get rows having Roll 19 and 25 in the output dataframe. However, the columns from right dataframe have NaN values in the respective rows for Roll 19 and 25.

Similarly, we don’t have rows having Roll 17, 30 and 31 in the left dataframe. But, they exist in the right dataframe. Hence, after a n outer join operation, we get rows having Roll 17, 30 and 31 in the output dataframe. However, the columns from left dataframe have NaN values in the respective rows for Roll 17, 30 and 31.

Merge DataFrames Using Indices as Join Keys

We can also merge the dataframes using indices as the join keys. For this, we need to set the left_index and right_index parameter to True based on the situation.

For instance, we can use the index of the left dataframe as the join key as shown below.

import numpy as np
import pandas as pd
df1=pd.read_csv("grade_with_roll1.csv",index_col="Roll")
print("First dataframe is:")
print(df1)
df2=pd.read_csv("name_data1.csv")
print("second dataframe is:")
print(df2)
df3=pd.merge(df1,df2,left_index=True, right_on="Roll")
print("Merged dataframe is:")
print(df3)

Output:

First dataframe is:
      Marks Grade
Roll             
11       85     A
12       95     A
13       75     B
14       75     B
16       78     B
15       55     C
19       75     B
20       72     B
24       92     A
25       95     A
second dataframe is:
    Class  Roll      Name
0       1    11    Aditya
1       1    12     Chris
2       1    13       Sam
3       1    14      Joel
4       1    15       Tom
5       1    16  Samantha
6       1    17     Pablo
7       1    20      Tina
8       1    24       Amy
9       1    30    Justin
10      1    31      Karl
Merged dataframe is:
   Marks Grade  Class  Roll      Name
0     85     A      1    11    Aditya
1     95     A      1    12     Chris
2     75     B      1    13       Sam
3     75     B      1    14      Joel
5     78     B      1    16  Samantha
4     55     C      1    15       Tom
7     72     B      1    20      Tina
8     92     A      1    24       Amy

In this example, the Roll column is used as an index in the left dataframe. Hence, we don’t need to pass the column name to the left_on parameter. Setting left_index=True does the work for us.

We can also use the index of right dataframe as the join key as shown below.

import numpy as np
import pandas as pd
df1=pd.read_csv("grade_with_roll1.csv")
print("First dataframe is:")
print(df1)
df2=pd.read_csv("name_data1.csv",index_col="Roll")
print("second dataframe is:")
print(df2)
df3=pd.merge(df1,df2,left_on="Roll", right_index=True)
print("Merged dataframe is:")
print(df3)

Output:

First dataframe is:
   Roll  Marks Grade
0    11     85     A
1    12     95     A
2    13     75     B
3    14     75     B
4    16     78     B
5    15     55     C
6    19     75     B
7    20     72     B
8    24     92     A
9    25     95     A
second dataframe is:
      Class      Name
Roll                 
11        1    Aditya
12        1     Chris
13        1       Sam
14        1      Joel
15        1       Tom
16        1  Samantha
17        1     Pablo
20        1      Tina
24        1       Amy
30        1    Justin
31        1      Karl
Merged dataframe is:
   Roll  Marks Grade  Class      Name
0    11     85     A      1    Aditya
1    12     95     A      1     Chris
2    13     75     B      1       Sam
3    14     75     B      1      Joel
4    16     78     B      1  Samantha
5    15     55     C      1       Tom
7    20     72     B      1      Tina
8    24     92     A      1       Amy

In this example, the Roll column is used as an index in the right dataframe. Hence, we don’t need to pass the column name to the right_on parameter. Setting right_index=True does the work for us. For left dataframe, we have used the left_on parameter to specify the join key.

To use indices of both the dataframes, we can use the left_index and right_index parameters as shown below.

import numpy as np
import pandas as pd
df1=pd.read_csv("grade_with_roll1.csv",index_col="Roll")
print("First dataframe is:")
print(df1)
df2=pd.read_csv("name_data1.csv",index_col="Roll")
print("second dataframe is:")
print(df2)
df3=pd.merge(df1,df2,left_index=True, right_index=True)
print("Merged dataframe is:")
print(df3)

Output:

First dataframe is:
      Marks Grade
Roll             
11       85     A
12       95     A
13       75     B
14       75     B
16       78     B
15       55     C
19       75     B
20       72     B
24       92     A
25       95     A
second dataframe is:
      Class      Name
Roll                 
11        1    Aditya
12        1     Chris
13        1       Sam
14        1      Joel
15        1       Tom
16        1  Samantha
17        1     Pablo
20        1      Tina
24        1       Amy
30        1    Justin
31        1      Karl
Merged dataframe is:
      Marks Grade  Class      Name
Roll                              
11       85     A      1    Aditya
12       95     A      1     Chris
13       75     B      1       Sam
14       75     B      1      Joel
16       78     B      1  Samantha
15       55     C      1       Tom
20       72     B      1      Tina
24       92     A      1       Amy

In this example, both the input dataframes have Roll column as their index. Hence, we have set left_index and right_index to True to specify the indices as join keys.

Merge Pandas DataFrames With Indicator

While merging dataframes, we can add some metadata in the output dataframes. For instance, we can specify if the join key was present in the left dataframe, the right dataframe or both the input dataframes.

For this, we can use the indicator parameter. When the indicator is set to True, the merge() function adds an additional column named _merge in the output dataframe.

For each row in the _merge column, if the join key were present in both the input dataframes, the output value in the _merge column in "both". If the join key is present only in the left dataframe, the output in the _merge column is "left_only". Similarly, if the join column is only present in the right dataframe, the value in the _merge column for the corresponding row will be "right_only". You can observe this in the following example.

import numpy as np
import pandas as pd
df1=pd.read_csv("grade_with_roll1.csv",index_col="Roll")
print("First dataframe is:")
print(df1)
df2=pd.read_csv("name_data1.csv",index_col="Roll")
print("second dataframe is:")
print(df2)
df3=pd.merge(df1,df2,how="outer",left_index=True, right_index=True,indicator=True)
print("Merged dataframe is:")
print(df3)

Output:

First dataframe is:
      Marks Grade
Roll             
11       85     A
12       95     A
13       75     B
14       75     B
16       78     B
15       55     C
19       75     B
20       72     B
24       92     A
25       95     A
second dataframe is:
      Class      Name
Roll                 
11        1    Aditya
12        1     Chris
13        1       Sam
14        1      Joel
15        1       Tom
16        1  Samantha
17        1     Pablo
20        1      Tina
24        1       Amy
30        1    Justin
31        1      Karl
Merged dataframe is:
      Marks Grade  Class      Name      _merge
Roll                                          
11     85.0     A    1.0    Aditya        both
12     95.0     A    1.0     Chris        both
13     75.0     B    1.0       Sam        both
14     75.0     B    1.0      Joel        both
15     55.0     C    1.0       Tom        both
16     78.0     B    1.0  Samantha        both
17      NaN   NaN    1.0     Pablo  right_only
19     75.0     B    NaN       NaN   left_only
20     72.0     B    1.0      Tina        both
24     92.0     A    1.0       Amy        both
25     95.0     A    NaN       NaN   left_only
30      NaN   NaN    1.0    Justin  right_only
31      NaN   NaN    1.0      Karl  right_only

Suggested Reading: If you are into machine learning, you can read this article on regression in machine learning. You might also like this article on k-means clustering with numerical example.

The Pandas concat() Function

We can concatenate pandas dataframes horizontally or vertically using the concat() function. The syntax for the concat() function is as follows.

pandas.concat(objs, axis=0, join='outer', ignore_index=False, keys=None,names=None)

Here, 

  • The objs parameter is a list or tuple of dataframes that need to be concatenated.
  • The axis parameter is used to decide whether the input dataframes are joined horizontally or vertically. If you want to concatenate the dataframes vertically, the axis is set to 0 which is its default value. To concatenate dataframes horizontally, the axis is set to 1.
  • The join parameter is used to decide how to handle indices on other indices. If we are concatenating the dataframes vertically, the join parameter decides what columns will be included in the output dataframe. If join is set to “inner”, the output dataframe contains only those columns that are present in all the input dataframes. If join is set to “outer”, which is its default value, all the columns from the input dataframes are included in the output dataframe.If we are concatenating the dataframes horizontally, the join operation is performed with indices of the input dataframes.
  • The ignore_index parameter is used to decide whether the output dataframe stores the indices from the input dataframes. By default, it is set to False due to which the indices of the input dataframes are preserved in the output dataframe. When ignore_index is set to True, the indices of the input dataframes are ignored.
  • The keys parameter is used to create an additional index level in the output dataframe. Generally, we use this parameter to identify the input dataframes to which the rows or columns in the output dataframe belong. This is done by specifying the names of the input dataframes as keys.
  • The names parameter is used to name the index columns that are created using the keys parameter.

After execution, the concat() function returns the concatenated dataframe.

Concatenate DataFrames Vertically Using the concat() Function

If we have two dataframes with similar data, we can concatenate them vertically using the concat() function.

import numpy as np
import pandas as pd
df1=pd.read_csv("grade1.csv")
print("First dataframe is:")
print(df1)
df2=pd.read_csv("grade2.csv")
print("second dataframe is:")
print(df2)
df3=pd.concat([df1,df2])
print("Merged dataframe is:")
print(df3)

Output:

First dataframe is:
   Class  Roll    Name  Marks Grade
0      1    11  Aditya     85     A
1      1    12   Chris     95     A
2      1    14     Sam     75     B
3      1    16  Aditya     78     B
4      1    15   Harry     55     C
5      2     1    Joel     68     B
6      2    22     Tom     73     B
7      2    15    Golu     79     B
second dataframe is:
   Class  Roll        Name  Marks Grade
0      2    27       Harsh     55     C
1      2    23       Clara     78     B
2      3    33        Tina     82     A
3      3    34         Amy     88     A
4      3    15    Prashant     78     B
5      3    27      Aditya     55     C
6      3    23  Radheshyam     78     B
7      3    11       Bobby     50     D
Merged dataframe is:
   Class  Roll        Name  Marks Grade
0      1    11      Aditya     85     A
1      1    12       Chris     95     A
2      1    14         Sam     75     B
3      1    16      Aditya     78     B
4      1    15       Harry     55     C
5      2     1        Joel     68     B
6      2    22         Tom     73     B
7      2    15        Golu     79     B
0      2    27       Harsh     55     C
1      2    23       Clara     78     B
2      3    33        Tina     82     A
3      3    34         Amy     88     A
4      3    15    Prashant     78     B
5      3    27      Aditya     55     C
6      3    23  Radheshyam     78     B
7      3    11       Bobby     50     D

In the above example, we have created the output dataframe from the input dataframes after merging all the rows. You can observe that the indices of the input dataframes have also been concatenated in the output dataframe.

To ignore the indices of the input dataframes and create new index, you can set the ignore_index parameter to True in the concat() function as shown in the following example.

import numpy as np
import pandas as pd
df1=pd.read_csv("grade1.csv")
print("First dataframe is:")
print(df1)
df2=pd.read_csv("grade2.csv")
print("second dataframe is:")
print(df2)
df3=pd.concat([df1,df2],ignore_index=True)
print("Merged dataframe is:")
print(df3)

Output:

First dataframe is:
   Class  Roll    Name  Marks Grade
0      1    11  Aditya     85     A
1      1    12   Chris     95     A
2      1    14     Sam     75     B
3      1    16  Aditya     78     B
4      1    15   Harry     55     C
5      2     1    Joel     68     B
6      2    22     Tom     73     B
7      2    15    Golu     79     B
second dataframe is:
   Class  Roll        Name  Marks Grade
0      2    27       Harsh     55     C
1      2    23       Clara     78     B
2      3    33        Tina     82     A
3      3    34         Amy     88     A
4      3    15    Prashant     78     B
5      3    27      Aditya     55     C
6      3    23  Radheshyam     78     B
7      3    11       Bobby     50     D
Merged dataframe is:
    Class  Roll        Name  Marks Grade
0       1    11      Aditya     85     A
1       1    12       Chris     95     A
2       1    14         Sam     75     B
3       1    16      Aditya     78     B
4       1    15       Harry     55     C
5       2     1        Joel     68     B
6       2    22         Tom     73     B
7       2    15        Golu     79     B
8       2    27       Harsh     55     C
9       2    23       Clara     78     B
10      3    33        Tina     82     A
11      3    34         Amy     88     A
12      3    15    Prashant     78     B
13      3    27      Aditya     55     C
14      3    23  Radheshyam     78     B
15      3    11       Bobby     50     D

Here, you can observe that all the rows have been assigned new indices to them.

Sometimes, the dataframes that are being concatenated horizontally many not have the same columns. In such a case, the columns in the output dataframe is the union of all the columns in the input dataframes. For instance, consider the following example.

import numpy as np
import pandas as pd
df1=pd.read_csv("grade_with_roll.csv")
print("First dataframe is:")
print(df1)
df2=pd.read_csv("grade_with_name.csv")
print("second dataframe is:")
print(df2)
df3=pd.concat([df1,df2])
print("Merged dataframe is:")
print(df3)

Output:

First dataframe is:
   Roll  Marks Grade
0    11     85     A
1    12     95     A
2    13     75     B
3    14     75     B
4    16     78     B
5    15     55     C
6    20     72     B
7    24     92     A
second dataframe is:
   Roll      Name  Marks Grade
0    11    Aditya     85     A
1    12     Chris     95     A
2    13       Sam     75     B
3    14      Joel     75     B
4    16       Tom     78     B
5    15  Samantha     55     C
6    20      Tina     72     B
7    24       Amy     92     A
Merged dataframe is:
   Roll  Marks Grade      Name
0    11     85     A       NaN
1    12     95     A       NaN
2    13     75     B       NaN
3    14     75     B       NaN
4    16     78     B       NaN
5    15     55     C       NaN
6    20     72     B       NaN
7    24     92     A       NaN
0    11     85     A    Aditya
1    12     95     A     Chris
2    13     75     B       Sam
3    14     75     B      Joel
4    16     78     B       Tom
5    15     55     C  Samantha
6    20     72     B      Tina
7    24     92     A       Amy

In the above example, the first dataframe contains the columns "Roll", "Marks", and"Grade". The second dataframe contains the columns "Roll", "Name", "Marks", and"Grade". Hence, the output dataframe contains the columns "Roll", "Name", "Marks", and"Grade".

The value in the “Name” column in the rows corresponding to the first dataframe will contain NaN values as the first dataframe doesn’t contain the "Name" column. Similarly, if the second dataframe would have had an extra column, the rows corresponding to the first dataframe will contain NaN values for the respective column.

Concat DataFrames Horizontally Using the concat() Function

We can also concatenate dataframes horizontally using the concat() function. For this, we need to use the parameter axis=1 in the concat() function.

When concatenate two dataframes horizontally using the concat() function, the rows of the input dataframes are merged using the index values of the dataframes. After execution of the concat() function, we get the output dataframe as shown below.

import numpy as np
import pandas as pd
df1=pd.read_csv("grade_with_roll.csv")
print("First dataframe is:")
print(df1)
df2=pd.read_csv("grade_with_name.csv")
print("second dataframe is:")
print(df2)
df3=pd.concat([df1,df2],axis=1)
print("Merged dataframe is:")
print(df3)

Output:

First dataframe is:
   Roll  Marks Grade
0    11     85     A
1    12     95     A
2    13     75     B
3    14     75     B
4    16     78     B
5    15     55     C
6    20     72     B
7    24     92     A
second dataframe is:
   Roll      Name  Marks Grade
0    11    Aditya     85     A
1    12     Chris     95     A
2    13       Sam     75     B
3    14      Joel     75     B
4    16       Tom     78     B
5    15  Samantha     55     C
6    20      Tina     72     B
7    24       Amy     92     A
Merged dataframe is:
   Roll  Marks Grade  Roll      Name  Marks Grade
0    11     85     A    11    Aditya     85     A
1    12     95     A    12     Chris     95     A
2    13     75     B    13       Sam     75     B
3    14     75     B    14      Joel     75     B
4    16     78     B    16       Tom     78     B
5    15     55     C    15  Samantha     55     C
6    20     72     B    20      Tina     72     B
7    24     92     A    24       Amy     92     A

If any of the input dataframes contains extra rows or indices that are not present in the other dataframe, the output dataframe contains NaN values in the columns corresponding to the other dataframe in the respective rows. You can observe this in the following example.

import numpy as np
import pandas as pd
df1=pd.read_csv("grade_with_roll.csv")
print("First dataframe is:")
print(df1)
df2=pd.read_csv("grade_with_name.csv")
print("second dataframe is:")
print(df2)
df3=pd.concat([df1,df2],axis=1)
print("Merged dataframe is:")
print(df3)

Output:

First dataframe is:
   Roll  Marks Grade
0    11     85     A
1    12     95     A
2    13     75     B
3    14     75     B
4    16     78     B
5    15     55     C
6    20     72     B
7    24     92     A
second dataframe is:
   Roll      Name  Marks Grade
0    11    Aditya     85     A
1    12     Chris     95     A
2    13       Sam     75     B
3    14      Joel     75     B
4    16       Tom     78     B
5    15  Samantha     55     C
6    20      Tina     72     B
7    24       Amy     92     A
8    25      Sinu     95     A
Merged dataframe is:
   Roll  Marks Grade  Roll      Name  Marks Grade
0  11.0   85.0     A    11    Aditya     85     A
1  12.0   95.0     A    12     Chris     95     A
2  13.0   75.0     B    13       Sam     75     B
3  14.0   75.0     B    14      Joel     75     B
4  16.0   78.0     B    16       Tom     78     B
5  15.0   55.0     C    15  Samantha     55     C
6  20.0   72.0     B    20      Tina     72     B
7  24.0   92.0     A    24       Amy     92     A
8   NaN    NaN   NaN    25      Sinu     95     A

In this example, you can observe that the last row contains NaN values for columns corresponding to the first dataframe. This is due to the reason that the first dataframe contain 1 row lesser that the second dataframe.

Concatenate DataFrames Horizontally With Only Common Row Indices

If you don’t want NaN values to be present in the output dataframe, you can use the join="inner"parameter as shown below.

import numpy as np
import pandas as pd
df1=pd.read_csv("grade_with_roll.csv")
print("First dataframe is:")
print(df1)
df2=pd.read_csv("grade_with_name.csv")
print("second dataframe is:")
print(df2)
df3=pd.concat([df1,df2],join="inner",axis=1)
print("Merged dataframe is:")
print(df3)

Output:

First dataframe is:
   Roll  Marks Grade
0    11     85     A
1    12     95     A
2    13     75     B
3    14     75     B
4    16     78     B
5    15     55     C
6    20     72     B
7    24     92     A
second dataframe is:
   Roll      Name  Marks Grade
0    11    Aditya     85     A
1    12     Chris     95     A
2    13       Sam     75     B
3    14      Joel     75     B
4    16       Tom     78     B
5    15  Samantha     55     C
6    20      Tina     72     B
7    24       Amy     92     A
8    25      Sinu     95     A
Merged dataframe is:
   Roll  Marks Grade  Roll      Name  Marks Grade
0    11     85     A    11    Aditya     85     A
1    12     95     A    12     Chris     95     A
2    13     75     B    13       Sam     75     B
3    14     75     B    14      Joel     75     B
4    16     78     B    16       Tom     78     B
5    15     55     C    15  Samantha     55     C
6    20     72     B    20      Tina     72     B
7    24     92     A    24       Amy     92     A

Concatenate DataFrames Horizontally With Custom Indices

When we join the dataframes horizontally, the rows are matched using the default indices. If we specify the indices of the input dataframes, the rows of the dataframes are matched using index values as keys. You can observe this in the following example.

import numpy as np
import pandas as pd
df1=pd.read_csv("grade_with_roll.csv",index_col="Roll")
print("First dataframe is:")
print(df1)
df2=pd.read_csv("grade_with_name.csv",index_col="Roll")
print("second dataframe is:")
print(df2)
df3=pd.concat([df1,df2],join="inner",axis=1)
print("Merged dataframe is:")
print(df3)

Output:

First dataframe is:
      Marks Grade
Roll             
11       85     A
12       95     A
13       75     B
14       75     B
16       78     B
15       55     C
20       72     B
24       92     A
second dataframe is:
          Name  Marks Grade
Roll                       
11      Aditya     85     A
12       Chris     95     A
13         Sam     75     B
14        Joel     75     B
16         Tom     78     B
15    Samantha     55     C
20        Tina     72     B
24         Amy     92     A
25        Sinu     95     A
Merged dataframe is:
      Marks Grade      Name  Marks Grade
Roll                                    
11       85     A    Aditya     85     A
12       95     A     Chris     95     A
13       75     B       Sam     75     B
14       75     B      Joel     75     B
16       78     B       Tom     78     B
15       55     C  Samantha     55     C
20       72     B      Tina     72     B
24       92     A       Amy     92     A

Add Identifiers to the Rows After Concatenating DataFrames

When we concatenate two dataframes, the output dataframe doesn’t specify which input dataframe a particular row belongs to. To specify this, we can use the "keys" parameter. The "keys" parameter takes a list of strings as its input. After execution, it adds the keys as an extra level of index in the output dataframe. Each key specified in the "keys" parameter corresponds to a specific dataframe. You can observe this in the following example.

import numpy as np
import pandas as pd
df1=pd.read_csv("grade1.csv")
print("First dataframe is:")
print(df1)
df2=pd.read_csv("grade2.csv")
print("second dataframe is:")
print(df2)
df3=pd.concat([df1,df2],keys=["Dataframe1","dataframe2"])
print("Merged dataframe is:")
print(df3)

Output:

First dataframe is:
   Class  Roll    Name  Marks Grade
0      1    11  Aditya     85     A
1      1    12   Chris     95     A
2      1    14     Sam     75     B
3      1    16  Aditya     78     B
4      1    15   Harry     55     C
5      2     1    Joel     68     B
6      2    22     Tom     73     B
7      2    15    Golu     79     B
second dataframe is:
   Class  Roll        Name  Marks Grade
0      2    27       Harsh     55     C
1      2    23       Clara     78     B
2      3    33        Tina     82     A
3      3    34         Amy     88     A
4      3    15    Prashant     78     B
5      3    27      Aditya     55     C
6      3    23  Radheshyam     78     B
7      3    11       Bobby     50     D
Merged dataframe is:
              Class  Roll        Name  Marks Grade
Dataframe1 0      1    11      Aditya     85     A
           1      1    12       Chris     95     A
           2      1    14         Sam     75     B
           3      1    16      Aditya     78     B
           4      1    15       Harry     55     C
           5      2     1        Joel     68     B
           6      2    22         Tom     73     B
           7      2    15        Golu     79     B
dataframe2 0      2    27       Harsh     55     C
           1      2    23       Clara     78     B
           2      3    33        Tina     82     A
           3      3    34         Amy     88     A
           4      3    15    Prashant     78     B
           5      3    27      Aditya     55     C
           6      3    23  Radheshyam     78     B
           7      3    11       Bobby     50     D

In the above example, you can observe that we have added “dataframe1” and “dataframe2” as extra level of index in the output dataframe. You can also give a name to the index column using the “names” parameter as shown below.

import numpy as np
import pandas as pd
df1=pd.read_csv("grade1.csv")
print("First dataframe is:")
print(df1)
df2=pd.read_csv("grade2.csv")
print("second dataframe is:")
print(df2)
df3=pd.concat([df1,df2],keys=["Dataframe1","dataframe2"],names=["Dataframe","index"])
print("Merged dataframe is:")
print(df3)

Output:

First dataframe is:
   Class  Roll    Name  Marks Grade
0      1    11  Aditya     85     A
1      1    12   Chris     95     A
2      1    14     Sam     75     B
3      1    16  Aditya     78     B
4      1    15   Harry     55     C
5      2     1    Joel     68     B
6      2    22     Tom     73     B
7      2    15    Golu     79     B
second dataframe is:
   Class  Roll        Name  Marks Grade
0      2    27       Harsh     55     C
1      2    23       Clara     78     B
2      3    33        Tina     82     A
3      3    34         Amy     88     A
4      3    15    Prashant     78     B
5      3    27      Aditya     55     C
6      3    23  Radheshyam     78     B
7      3    11       Bobby     50     D
Merged dataframe is:
                  Class  Roll        Name  Marks Grade
Dataframe  index                                      
Dataframe1 0          1    11      Aditya     85     A
           1          1    12       Chris     95     A
           2          1    14         Sam     75     B
           3          1    16      Aditya     78     B
           4          1    15       Harry     55     C
           5          2     1        Joel     68     B
           6          2    22         Tom     73     B
           7          2    15        Golu     79     B
dataframe2 0          2    27       Harsh     55     C
           1          2    23       Clara     78     B
           2          3    33        Tina     82     A
           3          3    34         Amy     88     A
           4          3    15    Prashant     78     B
           5          3    27      Aditya     55     C
           6          3    23  Radheshyam     78     B
           7          3    11       Bobby     50     D

The Pandas join() Method

Instead of the merge() function, we can also use the join() method to perform joins on two dataframes. 

The syntax for the join() method is as follows.

df.join(other, on=None, how='left', lsuffix='', rsuffix='’)

Here,

  • df is our left dataframe.
  • The parameter “other” denotes the right dataframe to join or a list of dataframes if we want to join multiple dataframes.
  • The on parameter is used to specify the column that is used as the join key.
  • The “how” parameter is used to specify whether left, right, inner, or full outer join will be performed. By default, it has the value “left”.
  • When the left and right dataframes have common column names, the lsuffix and rsuffix parameters are used to specify suffixes of columns from input dataframes in the output dataframe.

The join() method works best when the join is performed with the indices of the dataframes as join keys.

Left Join Pandas DataFrames Using the Join() Method

By default, the join() method performs left join operation. When invoked on a dataframe, it takes another dataframe as its input argument. After execution, it returns the joined dataframe. The input dataframes are joined using the index columns as the join key.

Here, I would like to emphasize that the join() method works best with dataframes having custom index columns.

import numpy as np
import pandas as pd
df1=pd.read_csv("grade_with_roll1.csv",index_col="Roll")
print("First dataframe is:")
print(df1)
df2=pd.read_csv("name_data1.csv",index_col="Roll")
print("second dataframe is:")
print(df2)
df3=df1.join(df2)
print("Merged dataframe is:")
print(df3)

Output:

First dataframe is:
      Marks Grade
Roll             
11       85     A
12       95     A
13       75     B
14       75     B
16       78     B
15       55     C
19       75     B
20       72     B
24       92     A
25       95     A
second dataframe is:
      Class      Name
Roll                 
11        1    Aditya
12        1     Chris
13        1       Sam
14        1      Joel
15        1       Tom
16        1  Samantha
17        1     Pablo
20        1      Tina
24        1       Amy
30        1    Justin
31        1      Karl
Merged dataframe is:
      Marks Grade  Class      Name
Roll                              
11       85     A    1.0    Aditya
12       95     A    1.0     Chris
13       75     B    1.0       Sam
14       75     B    1.0      Joel
16       78     B    1.0  Samantha
15       55     C    1.0       Tom
19       75     B    NaN       NaN
20       72     B    1.0      Tina
24       92     A    1.0       Amy
25       95     A    NaN       NaN

Right Join Pandas DataFrames Using the join() Method

We can perform right join operation on pandas dataframes using the join() method as shown below.

import numpy as np
import pandas as pd
df1=pd.read_csv("grade_with_roll1.csv",index_col="Roll")
print("First dataframe is:")
print(df1)
df2=pd.read_csv("name_data1.csv",index_col="Roll")
print("second dataframe is:")
print(df2)
df3=df1.join(df2,how="right")
print("Merged dataframe is:")
print(df3)

Output:

First dataframe is:
      Marks Grade
Roll             
11       85     A
12       95     A
13       75     B
14       75     B
16       78     B
15       55     C
19       75     B
20       72     B
24       92     A
25       95     A
second dataframe is:
      Class      Name
Roll                 
11        1    Aditya
12        1     Chris
13        1       Sam
14        1      Joel
15        1       Tom
16        1  Samantha
17        1     Pablo
20        1      Tina
24        1       Amy
30        1    Justin
31        1      Karl
Merged dataframe is:
      Marks Grade  Class      Name
Roll                              
11     85.0     A      1    Aditya
12     95.0     A      1     Chris
13     75.0     B      1       Sam
14     75.0     B      1      Joel
15     55.0     C      1       Tom
16     78.0     B      1  Samantha
17      NaN   NaN      1     Pablo
20     72.0     B      1      Tina
24     92.0     A      1       Amy
30      NaN   NaN      1    Justin
31      NaN   NaN      1      Karl

Inner Join DataFrames Using the join() Method

We can perform inner join operation on pandas dataframes by passing the parameter how="inner" to the join() method as shown below.

import numpy as np
import pandas as pd
df1=pd.read_csv("grade_with_roll1.csv",index_col="Roll")
print("First dataframe is:")
print(df1)
df2=pd.read_csv("name_data1.csv",index_col="Roll")
print("second dataframe is:")
print(df2)
df3=df1.join(df2,how="inner")
print("Merged dataframe is:")
print(df3)

Output:

First dataframe is:
      Marks Grade
Roll             
11       85     A
12       95     A
13       75     B
14       75     B
16       78     B
15       55     C
19       75     B
20       72     B
24       92     A
25       95     A
second dataframe is:
      Class      Name
Roll                 
11        1    Aditya
12        1     Chris
13        1       Sam
14        1      Joel
15        1       Tom
16        1  Samantha
17        1     Pablo
20        1      Tina
24        1       Amy
30        1    Justin
31        1      Karl
Merged dataframe is:
      Marks Grade  Class      Name
Roll                              
11       85     A      1    Aditya
12       95     A      1     Chris
13       75     B      1       Sam
14       75     B      1      Joel
16       78     B      1  Samantha
15       55     C      1       Tom
20       72     B      1      Tina
24       92     A      1       Amy

Outer Join DataFrames Using the join() Method

We can perform outer join on pandas dataframes using the join() method as shown in the following example.

import numpy as np
import pandas as pd
df1=pd.read_csv("grade_with_roll1.csv",index_col="Roll")
print("First dataframe is:")
print(df1)
df2=pd.read_csv("name_data1.csv",index_col="Roll")
print("second dataframe is:")
print(df2)
df3=df1.join(df2,how="outer")
print("Merged dataframe is:")
print(df3)

Output:

First dataframe is:
      Marks Grade
Roll             
11       85     A
12       95     A
13       75     B
14       75     B
16       78     B
15       55     C
19       75     B
20       72     B
24       92     A
25       95     A
second dataframe is:
      Class      Name
Roll                 
11        1    Aditya
12        1     Chris
13        1       Sam
14        1      Joel
15        1       Tom
16        1  Samantha
17        1     Pablo
20        1      Tina
24        1       Amy
30        1    Justin
31        1      Karl
Merged dataframe is:
      Marks Grade  Class      Name
Roll                              
11     85.0     A    1.0    Aditya
12     95.0     A    1.0     Chris
13     75.0     B    1.0       Sam
14     75.0     B    1.0      Joel
15     55.0     C    1.0       Tom
16     78.0     B    1.0  Samantha
17      NaN   NaN    1.0     Pablo
19     75.0     B    NaN       NaN
20     72.0     B    1.0      Tina
24     92.0     A    1.0       Amy
25     95.0     A    NaN       NaN
30      NaN   NaN    1.0    Justin
31      NaN   NaN    1.0      Karl

Join DataFrames With Common Column Names

If the input dataframes have common column names, we need to specify the suffixes for the columns of the dataframes using the rsuffix and lsuffix parameter as shown below.

import numpy as np
import pandas as pd
df1=pd.read_csv("grade_with_roll1.csv")
print("First dataframe is:")
print(df1)
df2=pd.read_csv("grade_with_name.csv",index_col="Roll")
print("second dataframe is:")
print(df2)
df3=df1.join(df2,on="Roll",how="outer",lsuffix="_left", rsuffix="_right")
print("Merged dataframe is:")
print(df3)

Output:

First dataframe is:
   Roll  Marks Grade
0    11     85     A
1    12     95     A
2    13     75     B
3    14     75     B
4    16     78     B
5    15     55     C
6    19     75     B
7    20     72     B
8    24     92     A
9    25     95     A
second dataframe is:
          Name  Marks Grade
Roll                       
11      Aditya     85     A
12       Chris     95     A
13         Sam     75     B
14        Joel     75     B
16         Tom     78     B
15    Samantha     55     C
20        Tina     72     B
24         Amy     92     A
25        Sinu     95     A
Merged dataframe is:
   Roll  Marks_left Grade_left      Name  Marks_right Grade_right
0    11          85          A    Aditya         85.0           A
1    12          95          A     Chris         95.0           A
2    13          75          B       Sam         75.0           B
3    14          75          B      Joel         75.0           B
4    16          78          B       Tom         78.0           B
5    15          55          C  Samantha         55.0           C
6    19          75          B       NaN          NaN         NaN
7    20          72          B      Tina         72.0           B
8    24          92          A       Amy         92.0           A
9    25          95          A      Sinu         95.0           A

In this example, the Marks and Grade column are present in both the input dataframes. Due to this we need to specify the suffixes for the column names.

In case you don’t specify the suffixes and the input dataframes have common column names, the program will run into error.

Join Dataframes Using a Column Name as Join Key

If the dataframe on which the join() method is invoked doesn’t have the join key as the index column, you can specify the column name using the "on" parameter. However, the dataframe passed as input argument needs to have the column being using as the join key as its index. You can observe this in the following example.

import numpy as np
import pandas as pd
df1=pd.read_csv("grade_with_roll1.csv")
print("First dataframe is:")
print(df1)
df2=pd.read_csv("name_data1.csv",index_col="Roll")
print("second dataframe is:")
print(df2)
df3=df1.join(df2,on="Roll",how="outer")
print("Merged dataframe is:")
print(df3)

Output:

First dataframe is:
   Roll  Marks Grade
0    11     85     A
1    12     95     A
2    13     75     B
3    14     75     B
4    16     78     B
5    15     55     C
6    19     75     B
7    20     72     B
8    24     92     A
9    25     95     A
second dataframe is:
      Class      Name
Roll                 
11        1    Aditya
12        1     Chris
13        1       Sam
14        1      Joel
15        1       Tom
16        1  Samantha
17        1     Pablo
20        1      Tina
24        1       Amy
30        1    Justin
31        1      Karl
Merged dataframe is:
     Roll  Marks Grade  Class      Name
0.0    11   85.0     A    1.0    Aditya
1.0    12   95.0     A    1.0     Chris
2.0    13   75.0     B    1.0       Sam
3.0    14   75.0     B    1.0      Joel
4.0    16   78.0     B    1.0  Samantha
5.0    15   55.0     C    1.0       Tom
6.0    19   75.0     B    NaN       NaN
7.0    20   72.0     B    1.0      Tina
8.0    24   92.0     A    1.0       Amy
9.0    25   95.0     A    NaN       NaN
NaN    17    NaN   NaN    1.0     Pablo
NaN    30    NaN   NaN    1.0    Justin
NaN    31    NaN   NaN    1.0      Karl

Join Multiple DataFrames Using the join() Method

We can also join multiple dataframes using the join() method. For this, we need to invoke the join() method on a dataframe and pass other dataframes in a list as input to the join() method.

Here, we need to make sure that all the dataframes should have the join key as their index column. Additionally, there should not be even a single common column between any two dataframes apart from the index column. Following these two conditions, we can join multiple pandas dataframes using the join() method as shown below.

import numpy as np
import pandas as pd
df1=pd.read_csv("grade_with_roll1.csv",index_col="Roll")
print("First dataframe is:")
print(df1)
df2=pd.read_csv("name_data.csv",index_col="Roll")
print("second dataframe is:")
print(df2)
df3=pd.read_csv("height_with_roll.csv",index_col="Roll")
print("Third dataframe is:")
print(df3)
df4=df1.join([df2,df3])
print("Merged dataframe is:")
print(df4)

Output:

First dataframe is:
      Marks Grade
Roll             
11       85     A
12       95     A
13       75     B
14       75     B
16       78     B
15       55     C
19       75     B
20       72     B
24       92     A
25       95     A
second dataframe is:
      Class      Name
Roll                 
11        1    Aditya
12        1     Chris
13        1       Sam
14        1      Joel
15        1       Tom
16        1  Samantha
20        1      Tina
24        1       Amy
Third dataframe is:
      Height
Roll        
11       170
12       165
13       155
14       180
16       140
15       162
19       175
20       163
24       154
25       161
Merged dataframe is:
      Marks Grade  Class      Name  Height
Roll                                      
11       85     A    1.0    Aditya     170
12       95     A    1.0     Chris     165
13       75     B    1.0       Sam     155
14       75     B    1.0      Joel     180
16       78     B    1.0  Samantha     140
15       55     C    1.0       Tom     162
19       75     B    NaN       NaN     175
20       72     B    1.0      Tina     163
24       92     A    1.0       Amy     154
25       95     A    NaN       NaN     161

If the input dataframes have common column names, the program will run into ValueError exception as shown in the following example.

import numpy as np
import pandas as pd
df1=pd.read_csv("grade_with_roll1.csv",index_col="Roll")
print("First dataframe is:")
print(df1)
df2=pd.read_csv("grade_with_name.csv",index_col="Roll")
print("second dataframe is:")
print(df2)
df3=pd.read_csv("height_with_roll.csv",index_col="Roll")
print("Third dataframe is:")
print(df3)
df4=df1.join([df2,df3])
print("Merged dataframe is:")
print(df4)

Output:

ValueError: Indexes have overlapping values: Index(['Marks', 'Grade'], dtype='object')

Conclusion

In this article, we have discussed how to merge, concatenate, and join pandas dataframes in python. . To know more about python programming, you can read this article on how to create a pandas dataframe. You might also like this article on string manipulation in Python.

The post Concatenate, Merge, and Join Pandas DataFrames appeared first on PythonForBeginners.com.


Viewing all articles
Browse latest Browse all 22875

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>