Quantcast
Channel: Planet Python
Viewing all articles
Browse latest Browse all 22852

Python for Beginners: Select Row From a Dataframe in Python

$
0
0

Pandas dataframes are used to handle tabular data in Python. In this article, we will discuss how to select a row from a dataframe in Python. We will also discuss how we can use boolean operators to select data from a pandas dataframe.

Select Row From a Dataframe Using iloc Attribute

The iloc attribute contains an _iLocIndexer object that works as an ordered collection of the rows in a dataframe. The functioning of the iloc attribute is similar to list indexing. You can use the iloc attribute to select a row from the dataframe. For this, you can simply use the position of the row inside the square brackets with the iloc attribute to select a row of a pandas dataframe as shown below.

myDf=pd.read_csv("samplefile.csv")
print("The dataframe is:")
print(myDf)
position=1
row=myDf.iloc[position]
print("The row at position {} is :{}".format(position,row))

Output:

The dataframe is:
   Class  Roll      Name
0      1    11    Aditya
1      1    12     Chris
2      1    13       Sam
3      2     1      Joel
4      2    22       Tom
5      2    44  Samantha
6      3    33      Tina
7      3    34       Amy
The row at position 1 is :Class        1
Roll        12
Name     Chris
Name: 1, dtype: object

Here, you can observe that the iloc attribute gives the row at the specified position as output.

Select Row From a Dataframe Using loc Attribute in Python

The loc attribute of a dataframe works in a similar manner to the keys of a python dictionary.  The loc attribute contains a _LocIndexer object that you can use to select rows from a pandas dataframe. You can use the index label inside the square brackets with the loc attribute to access the elements of a pandas series as shown below.

myDf=pd.read_csv("samplefile.csv")
print("The dataframe is:")
print(myDf)
index=2
row=myDf.loc[index]
print("The row at index {} is :{}".format(index,row))

Output:

The dataframe is:
   Class  Roll      Name
0      1    11    Aditya
1      1    12     Chris
2      1    13       Sam
3      2     1      Joel
4      2    22       Tom
5      2    44  Samantha
6      3    33      Tina
7      3    34       Amy
The row at index 2 is :Class      1
Roll      13
Name     Sam
Name: 2, dtype: object

If you have defined a custom index for a dataframe, you can use the index value of a row to select the row from the pandas dataframe as shown below.

myDf=pd.read_csv("samplefile.csv",index_col=0)
print("The dataframe is:")
print(myDf)
index=1
row=myDf.loc[index]
print("The row at index {} is :{}".format(index,row))

Output:

The dataframe is:
       Roll      Name
Class                
1        11    Aditya
1        12     Chris
1        13       Sam
2         1      Joel
2        22       Tom
2        44  Samantha
3        33      Tina
3        34       Amy
The row at index 1 is :       Roll    Name
Class              
1        11  Aditya
1        12   Chris
1        13     Sam

If you have a multilevel index, you can use the indices to select rows from a dataframe as shown below.

myDf=pd.read_csv("samplefile.csv",index_col=[0,1])
print("The dataframe is:")
print(myDf)
index=(1,12)
row=myDf.loc[index]
print("The row at index {} is :{}".format(index,row))

Output:

The dataframe is:
                Name
Class Roll          
1     11      Aditya
      12       Chris
      13         Sam
2     1         Joel
      22         Tom
      44    Samantha
3     33        Tina
      34         Amy
The row at index (1, 12) is :Name    Chris
Name: (1, 12), dtype: object

Select Column Using Column Name in a Pandas Dataframe

To select a column from a dataframe, you can use the column name with square brackets as shown below.

myDf=pd.read_csv("samplefile.csv")
print("The dataframe is:")
print(myDf)
column_name="Class"
column=myDf[column_name]
print("The {} column is :{}".format(column_name,column))

Output:

The dataframe is:
   Class  Roll      Name
0      1    11    Aditya
1      1    12     Chris
2      1    13       Sam
3      2     1      Joel
4      2    22       Tom
5      2    44  Samantha
6      3    33      Tina
7      3    34       Amy
The Class column is :0    1
1    1
2    1
3    2
4    2
5    2
6    3
7    3
Name: Class, dtype: int64

If you want to select more than one column from a dataframe, you can pass a list of column names to the square brackets as shown below.

myDf=pd.read_csv("samplefile.csv")
print("The dataframe is:")
print(myDf)
column_names=["Class","Name"]
column=myDf[column_names]
print("The {} column is :{}".format(column_names,column))

Output:

The dataframe is:
   Class  Roll      Name
0      1    11    Aditya
1      1    12     Chris
2      1    13       Sam
3      2     1      Joel
4      2    22       Tom
5      2    44  Samantha
6      3    33      Tina
7      3    34       Amy
The ['Class', 'Name'] column is :   Class      Name
0      1    Aditya
1      1     Chris
2      1       Sam
3      2      Joel
4      2       Tom
5      2  Samantha
6      3      Tina
7      3       Amy

Boolean Masking in a Pandas Dataframe

Boolean masking is used to check for a condition in a dataframe. When we apply a boolean operator on  a dataframe column, it returns a pandas series object containing True and False values based on the condition as shown below.

myDf=pd.read_csv("samplefile.csv")
print("The dataframe is:")
print(myDf)
boolean_mask=myDf["Class"]>1
print("The boolean mask is:")
print(boolean_mask)

Output:

The dataframe is:
   Class  Roll      Name
0      1    11    Aditya
1      1    12     Chris
2      1    13       Sam
3      2     1      Joel
4      2    22       Tom
5      2    44  Samantha
6      3    33      Tina
7      3    34       Amy
The boolean mask is:
0    False
1    False
2    False
3     True
4     True
5     True
6     True
7     True
Name: Class, dtype: bool

You can select rows from a dataframe using the boolean mask. For this, you need to pass the series containing the boolean mask to the square brackets operator as shown below.

myDf=pd.read_csv("samplefile.csv")
print("The dataframe is:")
print(myDf)
boolean_mask=myDf["Class"]>1
print("The boolean mask is:")
print(boolean_mask)
print("The rows in which class>1 is:")
rows=myDf[boolean_mask]
print(rows)

Output:

The dataframe is:
   Class  Roll      Name
0      1    11    Aditya
1      1    12     Chris
2      1    13       Sam
3      2     1      Joel
4      2    22       Tom
5      2    44  Samantha
6      3    33      Tina
7      3    34       Amy
The boolean mask is:
0    False
1    False
2    False
3     True
4     True
5     True
6     True
7     True
Name: Class, dtype: bool
The rows in which class>1 is:
   Class  Roll      Name
3      2     1      Joel
4      2    22       Tom
5      2    44  Samantha
6      3    33      Tina
7      3    34       Amy

Instead of using square brackets, you can also use the where() method to select rows from a dataframe using boolean masking. The where() method, when invoked on a dataframe, takes the boolean masks as its input argument and returns a new dataframe containing the desired rows as shown below.

myDf=pd.read_csv("samplefile.csv")
print("The dataframe is:")
print(myDf)
boolean_mask=myDf["Class"]>1
print("The boolean mask is:")
print(boolean_mask)
print("The rows in which class>1 is:")
rows=myDf.where(boolean_mask)
print(rows)

Output:

The dataframe is:
   Class  Roll      Name
0      1    11    Aditya
1      1    12     Chris
2      1    13       Sam
3      2     1      Joel
4      2    22       Tom
5      2    44  Samantha
6      3    33      Tina
7      3    34       Amy
The boolean mask is:
0    False
1    False
2    False
3     True
4     True
5     True
6     True
7     True
Name: Class, dtype: bool
The rows in which class>1 is:
   Class  Roll      Name
0    NaN   NaN       NaN
1    NaN   NaN       NaN
2    NaN   NaN       NaN
3    2.0   1.0      Joel
4    2.0  22.0       Tom
5    2.0  44.0  Samantha
6    3.0  33.0      Tina
7    3.0  34.0       Amy

In the above example using the where() method, the rows where the boolean mask has the value False, NaN values are stored in the dataframe. You can drop the rows containing NaN values as shown below.

myDf=pd.read_csv("samplefile.csv")
print("The dataframe is:")
print(myDf)
boolean_mask=myDf["Class"]>1
print("The boolean mask is:")
print(boolean_mask)
print("The rows in which class>1 is:")
rows=myDf.where(boolean_mask).dropna()
print(rows)

Output:

The dataframe is:
   Class  Roll      Name
0      1    11    Aditya
1      1    12     Chris
2      1    13       Sam
3      2     1      Joel
4      2    22       Tom
5      2    44  Samantha
6      3    33      Tina
7      3    34       Amy
The boolean mask is:
0    False
1    False
2    False
3     True
4     True
5     True
6     True
7     True
Name: Class, dtype: bool
The rows in which class>1 is:
   Class  Roll      Name
3    2.0   1.0      Joel
4    2.0  22.0       Tom
5    2.0  44.0  Samantha
6    3.0  33.0      Tina
7    3.0  34.0       Amy

You can also use logical operators to create boolean masks from two or more conditions as shown below.

myDf=pd.read_csv("samplefile.csv")
print("The dataframe is:")
print(myDf)
boolean_mask=(myDf["Class"]>1) & (myDf["Class"]<3)
print("The boolean mask is:")
print(boolean_mask)
print("The rows in which class>1 and <3 is:")
rows=myDf.where(boolean_mask).dropna()
print(rows)

Output:

The dataframe is:
   Class  Roll      Name
0      1    11    Aditya
1      1    12     Chris
2      1    13       Sam
3      2     1      Joel
4      2    22       Tom
5      2    44  Samantha
6      3    33      Tina
7      3    34       Amy
The boolean mask is:
0    False
1    False
2    False
3     True
4     True
5     True
6    False
7    False
Name: Class, dtype: bool
The rows in which class>1 and <3 is:
   Class  Roll      Name
3    2.0   1.0      Joel
4    2.0  22.0       Tom
5    2.0  44.0  Samantha

After creating the boolean mask, you can use it to select the rows where the boolean mask contains True as shown below.

Conclusion

In this article, we discussed how to select a row from a dataframe. We also discussed how to select a column from a dataframe and how to select multiple rows from a dataframe using boolean masking.

To learn more about python programming, you can read this article on list comprehension. If you are looking to get into machine learning, you can read this article on regression in machine learning.

The post Select Row From a Dataframe in Python appeared first on PythonForBeginners.com.


Viewing all articles
Browse latest Browse all 22852

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>