Quantcast
Channel: Planet Python
Viewing all articles
Browse latest Browse all 22872

Python for Beginners: Concatenate DataFrames in Python

$
0
0

We use dataframes in python to handle and analyze tabular data in python. In this article, we will discuss how we can concatenate two or more dataframes in python.

How to Concatenate DataFrames in Python?

To concatenate two or more dataframes in python, we can use the concat() method defined in the pandas module. The concat() method takes a list of dataframes as its input arguments and concatenates them vertically.

We can also concatenate the dataframes in python horizontally using the axis parameter of the concat() method. The axis parameter has a default value of 0, which denotes that the dataframes will be concatenated vertically. If you want to concatenate the dataframes horizontally, you can pass the value 1 to the axis parameter.

After execution, the concat() method will return the resultant dataframe.

Concatenate Dataframes Vertically in python

To concatenate two dataframes vertically in python, you need to first import the pandas module using the import statement. After that, you can concatenate the dataframes using the concat() method as follows.

import numpy as np
import pandas as pd
df1=pd.read_csv("grade1.csv")
print("First dataframe is:")
print(df1)
df2=pd.read_csv("grade2.csv")
print("second dataframe is:")
print(df2)
df3=pd.concat([df1,df2])
print("Merged dataframe is:")
print(df3)

Output:

First dataframe is:
   Class  Roll    Name  Marks Grade
0      1    11  Aditya     85     A
1      1    12   Chris     95     A
2      1    14     Sam     75     B
3      1    16  Aditya     78     B
4      1    15   Harry     55     C
5      2     1    Joel     68     B
6      2    22     Tom     73     B
7      2    15    Golu     79     B
second dataframe is:
   Class  Roll        Name  Marks Grade
0      2    27       Harsh     55     C
1      2    23       Clara     78     B
2      3    33        Tina     82     A
3      3    34         Amy     88     A
4      3    15    Prashant     78     B
5      3    27      Aditya     55     C
6      3    23  Radheshyam     78     B
7      3    11       Bobby     50     D
Merged dataframe is:
   Class  Roll        Name  Marks Grade
0      1    11      Aditya     85     A
1      1    12       Chris     95     A
2      1    14         Sam     75     B
3      1    16      Aditya     78     B
4      1    15       Harry     55     C
5      2     1        Joel     68     B
6      2    22         Tom     73     B
7      2    15        Golu     79     B
0      2    27       Harsh     55     C
1      2    23       Clara     78     B
2      3    33        Tina     82     A
3      3    34         Amy     88     A
4      3    15    Prashant     78     B
5      3    27      Aditya     55     C
6      3    23  Radheshyam     78     B
7      3    11       Bobby     50     D

If all the dataframes have the same number of columns and the column names are also the same, the resultant dataframe has the same number of columns as the input dataframes. You can observe this in the example above.

However, if a dataframe has less number of columns than the other dataframes, the corresponding value in the resultant dataframe for that column will be NaN for the rows obtained from the dataframe. You can observe this in the following example.

import numpy as np
import pandas as pd
df1=pd.read_csv("grade_with_roll.csv")
print("First dataframe is:")
print(df1)
df2=pd.read_csv("grade_with_name.csv")
print("second dataframe is:")
print(df2)
df3=pd.concat([df1,df2])
print("Merged dataframe is:")
print(df3)

Output:

First dataframe is:
   Roll  Marks Grade
0    11     85     A
1    12     95     A
2    13     75     B
3    14     75     B
4    16     78     B
5    15     55     C
6    20     72     B
7    24     92     A
second dataframe is:
   Roll      Name  Marks Grade
0    11    Aditya     85     A
1    12     Chris     95     A
2    13       Sam     75     B
3    14      Joel     75     B
4    16       Tom     78     B
5    15  Samantha     55     C
6    20      Tina     72     B
7    24       Amy     92     A
Merged dataframe is:
   Roll  Marks Grade      Name
0    11     85     A       NaN
1    12     95     A       NaN
2    13     75     B       NaN
3    14     75     B       NaN
4    16     78     B       NaN
5    15     55     C       NaN
6    20     72     B       NaN
7    24     92     A       NaN
0    11     85     A    Aditya
1    12     95     A     Chris
2    13     75     B       Sam
3    14     75     B      Joel
4    16     78     B       Tom
5    15     55     C  Samantha
6    20     72     B      Tina
7    24     92     A       Amy

If the dataframes have different column names, each column name is assigned a separate column in the resultant dataframe. Also, the corresponding value in the resultant dataframe for that column will be NaN for the rows obtained dataframes that do not have the specified column.

Suggested Reading: If you are into machine learning, you can read this article on regression in machine learning. You might also like this article on k-means clustering with numerical example.

Concatenate DataFrames Horizontally in Python

To concatenate dataframes horizontally, we will use the axis parameter and give the value 1 as its input in the concat() method. After execution, the concat() method will return the horizontally concatenated dataframe as shown below.

import numpy as np
import pandas as pd
df1=pd.read_csv("grade_with_roll.csv")
print("First dataframe is:")
print(df1)
df2=pd.read_csv("grade_with_name.csv")
print("second dataframe is:")
print(df2)
df3=pd.concat([df1,df2],axis=1)
print("Merged dataframe is:")
print(df3)

Output:

First dataframe is:
   Roll  Marks Grade
0    11     85     A
1    12     95     A
2    13     75     B
3    14     75     B
4    16     78     B
5    15     55     C
6    20     72     B
7    24     92     A
second dataframe is:
   Roll      Name  Marks Grade
0    11    Aditya     85     A
1    12     Chris     95     A
2    13       Sam     75     B
3    14      Joel     75     B
4    16       Tom     78     B
5    15  Samantha     55     C
6    20      Tina     72     B
7    24       Amy     92     A
Merged dataframe is:
   Roll  Marks Grade  Roll      Name  Marks Grade
0    11     85     A    11    Aditya     85     A
1    12     95     A    12     Chris     95     A
2    13     75     B    13       Sam     75     B
3    14     75     B    14      Joel     75     B
4    16     78     B    16       Tom     78     B
5    15     55     C    15  Samantha     55     C
6    20     72     B    20      Tina     72     B
7    24     92     A    24       Amy     92     A

If the dataframes that are being concatenated have the same number of records, the resultant dataframe will not have any NaN values as shown in the above example. However, if a dataframe has a lesser number of rows than the other dataframe, the resultant dataframe will have NaN values. This occurs when the join parameter is set to “outer”.

Conclusion

In this article, we have discussed how to concatenate two pandas dataframe in python. To concatenate more than two dataframes, you just need to add the dataframe to the list of dataframes that is given as input to the concat() method.

To learn more about python programming, you can read this article on dictionary comprehension in python. You might also like this article on list comprehension in python.

The post Concatenate DataFrames in Python appeared first on PythonForBeginners.com.


Viewing all articles
Browse latest Browse all 22872

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>