Quantcast
Channel: Planet Python
Viewing all articles
Browse latest Browse all 22853

Python for Beginners: Left Join Dataframes in Python

$
0
0

The left join operation is used in SQL to join two tables. In this article, we will discuss how we can perform left join operation on two dataframes in python.

What is Left Join Operation?

Suppose that we have two tables A and B. When we perform the operation (A left join B), we get a new table that has all the rows from table A along with the corresponding rows in table B. Apart from that, all the rows from table A that do not have any matching row in table B are also included in the output table. However, The rows belonging to table B, that doesn’t have any matching row in table A are omitted from the final result. 

To understand this, suppose that we have table A that contains rows having details of students, and table B that contains rows having grades of the students. Also, both the tables have a common column, say‘Name’. Now, if we want to perform the operation A left join B, the resultant table will have the student details along with their marks. Also, the details of those students will be mentioned in the output table whose marks are not present in table B. On the contrary, the marks of those students will not be included in the output table whose details are not present in table A.

As dataframes contain tabular data, we can perform the left join operation on dataframes in python. For this, we will use the merge() method and the join() method.

You can download the files used in the programs using the below links.

Left Join DataFrames Using The merge() Method

We can perform the left join operation on the dataframes using the merge() method in python. For this, we will invoke the merge() method on the first dataframe. Also, we will pass the second dataframe as the first input argument to the merge() method. Additionally, we will pass the name of the column that is to be matched as the input argument to the‘on’ parameter and the literal ‘left’ as an input argument to the ‘how’ parameter. After execution, the merge() method will return the output dataframe as shown in the following example.

import pandas as pd
import numpy as np
names=pd.read_csv("name.csv")
grades=pd.read_csv("grade.csv")
resultdf=names.merge(grades,how="left",on="Name")
print("The resultant dataframe is:")
print(resultdf)

Output:

The resultant dataframe is:
   Class_x  Roll_x      Name  Class_y  Roll_y Grade
0        1      11    Aditya      1.0    11.0     A
1        1      12     Chris      1.0    12.0    A+
2        1      13       Sam      NaN     NaN   NaN
3        2       1      Joel      2.0     1.0     B
4        2      22       Tom      2.0    22.0    B+
5        2      44  Samantha      NaN     NaN   NaN
6        3      33      Tina      3.0    33.0    A-
7        3      34       Amy      3.0    34.0     A

If there are rows in the first dataframe that have no matching dataframes in the second dataframe, the rows are still included in the output. However, this is not true for the rows in the second dataframe that do not have any matching row in the first dataframe. You can observe this in the above example.

If there are columns with the same name, the python interpreter adds _x and _y suffixes to the column names. To identify the columns from the dataframe on which the merge() method in invoked, _x suffix is added. For the dataframe that is passed as the input argument to the merge() method, _y suffix is used.

Suggested Reading: If you are into machine learning, you can read this article on regression in machine learning. You might also like this article on k-means clustering with numerical example.

Left Join DataFrames Using the join() Method

Instead of using the merge() method, we can use the join() method to perform the left join operation on the given dataframes. The join() method, when invoked on a dataframe, takes another dataframe as its first input argument. Additionally, we will pass the name of the column that is to be matched as the input argument to the ‘on’ parameter and the literal ‘left’ as an input argument to the ‘how’ parameter. After execution, the join() method returns the output dataframe as shown in the following example.

import pandas as pd
import numpy as np
names=pd.read_csv("name.csv")
grades=pd.read_csv("grade.csv")
grades=grades.set_index("Name")
resultdf=names.join(grades,how="left",on="Name",lsuffix='_names', rsuffix='_grades')
print("The resultant dataframe is:")
print(resultdf)

Output:

The resultant dataframe is:
   Class_names  Roll_names      Name  Class_grades  Roll_grades Grade
0            1          11    Aditya           1.0         11.0     A
1            1          12     Chris           1.0         12.0    A+
2            1          13       Sam           NaN          NaN   NaN
3            2           1      Joel           2.0          1.0     B
4            2          22       Tom           2.0         22.0    B+
5            2          44  Samantha           NaN          NaN   NaN
6            3          33      Tina           3.0         33.0    A-
7            3          34       Amy           3.0         34.0     A

While using the join() method, you also need to keep in mind that the column on which the join operation is to be performed should be the index of the dataframe that is passed as input argument to the join() method. If the dataframes have same column names for some columns, you need to specify the suffix for column names using the lsuffix and rsuffix parameters. The values passed to these parameters help us identify which column comes from which dataframe if the column names are the same.

Conclusion

In this article, we have discussed two ways to perform left join operation on dataframes in python. To know more about python programming, you can read this article on dictionary comprehension in python. You might also like this article on list comprehension in python.

The post Left Join Dataframes in Python appeared first on PythonForBeginners.com.


Viewing all articles
Browse latest Browse all 22853

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>