Quantcast
Channel: Planet Python
Viewing all articles
Browse latest Browse all 22907

Python for Beginners: Drop Rows From Pandas Dataframe

$
0
0

We use pandas dataframes for many data processing tasks in Python. Sometimes, we need to drop some rows from the dataframe due to various reasons. In this article, we will discuss different ways to drop rows from a pandas dataframe using the drop() method. 

The drop() Method

The drop() method can be used to drop columns or rows from a pandas dataframe. It has the following syntax.

DataFrame.drop(labels=None, *, axis=0, index=None, columns=None, level=None, inplace=False, errors='raise')

Here, 

  • The index parameter is used when we have to drop a row from the dataframe. The index parameter takes an index or a list of indices that have to be deleted as its input argument. 
  • The columns parameter is used when we need to drop a column from the dataframe. The columns parameter takes a column name or a list of column names that need to be dropped as its input argument.
  • The labels parameter represents the index or column label that we need to remove from the dataframe. To drop rows from a dataframe, we use the index label. To drop two or more rows, we can also pass a list of indices to the labels parameter.
  • When we don’t use the index parameter, we can pass the index of the row that needs to be deleted to the labels parameter as its input argument. In such cases, we use the axis parameter to decide if we want to drop a row or a column. if we want to drop a column from the dataframe, we set the axis parameter to 1. When we want to drop a row from the dataframe, we set the axis parameter to 0 which is its default value.
  • The level parameter is used to drop rows from a dataframe when we have multilevel indices. The level parameter takes the index level or the index name of the row that we want to drop from the dataframe. To drop two or more levels, you can pass the list of index levels or index names to the level parameter.
  • The inplace parameter is used to decide if we get a new dataframe after the drop operation or if we want to modify the original dataframe. When inplace is set to False, which is its default value, the original dataframe isn’t changed and the drop() method returns the modified dataframe after execution. To modify the original dataframe, you can set inplace to True. 
  • The errors parameter is used to decide if we want to raise exceptions and errors while executing the drop() method. By default, the errors parameter is set to “raise”. Due to this, the drop() method raises an exception if anything goes bad while execution. If you don’t want the errors to be raised, you can set the errors parameter to “ignore”. After this, the drop() method will suppress all the exceptions.

After execution, the drop() method returns a new dataframe if the inplace parameter is set to False. Otherwise, it modifies the original dataframe and returns None.

Drop Rows From Pandas Dataframe by Index Labels

To drop columns of a dataframe by index labels, we will pass the index label to the labels parameter in the drop() method. After execution, the drop() method will return a dataframe with all the rows except the row with the index label specified in the labels parameter. You can observe this in the following example.

import pandas as pd
df=pd.read_csv("grade2.csv",index_col="Marks")
print("The dataframe is:")
print(df)
print("After dropping rows with index 55")
df=df.drop(labels=55)
print("The modified dataframe is:")
print(df)

Output:

The dataframe is:
       Class  Roll        Name Grade
Marks                               
55         2    27       Harsh     C
78         2    23       Clara     B
82         3    33        Tina     A
88         3    34         Amy     A
78         3    15    Prashant     B
55         3    27      Aditya     C
78         3    23  Radheshyam     B
50         3    11       Bobby     D
After dropping rows with index 55
The modified dataframe is:
       Class  Roll        Name Grade
Marks                               
78         2    23       Clara     B
82         3    33        Tina     A
88         3    34         Amy     A
78         3    15    Prashant     B
78         3    23  Radheshyam     B
50         3    11       Bobby     D

In the above example, we have created a dataframe using a csv file. Then, we have dropped the rows in the dataframe with index 55. In the output dataframe, you can observe that all the rows with index 55 are absent. Thus, the drop() method has deleted the rows with the specified index.

Instead of the labels parameter, you can use the index parameter in the drop() method to drop a row from a dataframe as shown in the following example.

import pandas as pd
df=pd.read_csv("grade2.csv",index_col="Marks")
print("The dataframe is:")
print(df)
print("After dropping rows with index 55")
df=df.drop(index=55)
print("The modified dataframe is:")
print(df)

Output:

The dataframe is:
       Class  Roll        Name Grade
Marks                               
55         2    27       Harsh     C
78         2    23       Clara     B
82         3    33        Tina     A
88         3    34         Amy     A
78         3    15    Prashant     B
55         3    27      Aditya     C
78         3    23  Radheshyam     B
50         3    11       Bobby     D
After dropping rows with index 55
The modified dataframe is:
       Class  Roll        Name Grade
Marks                               
78         2    23       Clara     B
82         3    33        Tina     A
88         3    34         Amy     A
78         3    15    Prashant     B
78         3    23  Radheshyam     B
50         3    11       Bobby     D

In the above example, we have used the index parameter instead of the labels parameter to pass the index value as input to the drop() method. You can observe that output is same for both the cases. Hence, you can use any of index or labels parameter to drop rows from a pandas dataframe.

Drop Rows From Pandas Dataframe by Position

To drop rows from a dataframe by position, we will use the following steps.

  • First, we will get the Index object of the dataframe using the index attribute.
  • Next, we will get the element of the index object present at the position of the row we want to drop from the dataframe using indexing operator. This element will be the label of the row we want to delete.
  • After obtaining the label of the row to be deleted, we can pass the label to the labels parameter as an input argument in the drop() method.

After execution of the drop() method, we will get the modified dataframe as shown below.

import pandas as pd
df=pd.read_csv("grade2.csv",index_col="Marks")
print("The dataframe is:")
print(df)
position=3
print("After dropping row at position 3")
idx=df.index[position-1]
df=df.drop(labels=idx)
print("The modified dataframe is:")
print(df)

Output:

The dataframe is:
       Class  Roll        Name Grade
Marks                               
55         2    27       Harsh     C
78         2    23       Clara     B
82         3    33        Tina     A
88         3    34         Amy     A
78         3    15    Prashant     B
55         3    27      Aditya     C
78         3    23  Radheshyam     B
50         3    11       Bobby     D
After dropping row at position 3
The modified dataframe is:
       Class  Roll        Name Grade
Marks                               
55         2    27       Harsh     C
78         2    23       Clara     B
88         3    34         Amy     A
78         3    15    Prashant     B
55         3    27      Aditya     C
78         3    23  Radheshyam     B
50         3    11       Bobby     D

In the above example, you can observe that we have dropped the row at the third position in the dataframe. Here, the row at the third position has index 82. Therefore, if there exists any other row with index 82, the row will also get deleted from the input dataframe.

In the above example, you can also pass the index label obtained from the index object to the index parameter in the drop() method. You will get the same result after execution of the program.

import pandas as pd
df=pd.read_csv("grade2.csv",index_col="Marks")
print("The dataframe is:")
print(df)
position=3
print("After dropping row at position 3")
idx=df.index[position-1]
df=df.drop(index=idx)
print("The modified dataframe is:")
print(df)

Output:

The dataframe is:
       Class  Roll        Name Grade
Marks                               
55         2    27       Harsh     C
78         2    23       Clara     B
82         3    33        Tina     A
88         3    34         Amy     A
78         3    15    Prashant     B
55         3    27      Aditya     C
78         3    23  Radheshyam     B
50         3    11       Bobby     D
After dropping row at position 3
The modified dataframe is:
       Class  Roll        Name Grade
Marks                               
55         2    27       Harsh     C
78         2    23       Clara     B
88         3    34         Amy     A
78         3    15    Prashant     B
55         3    27      Aditya     C
78         3    23  Radheshyam     B
50         3    11       Bobby     D

Drop the First Row From Pandas Dataframe

To drop the first row from a dataframe, we will first obtain the index label of the first row using the index attribute. 

Then, we will pass the index label to the labels parameter in the drop() method to drop the first row of the dataframe as shown below.

import pandas as pd
df=pd.read_csv("grade2.csv",index_col="Marks")
print("The dataframe is:")
print(df)
position=1
print("After dropping first row")
idx=df.index[position-1]
df=df.drop(index=idx)
print("The modified dataframe is:")
print(df)

Output:

The dataframe is:
       Class  Roll        Name Grade
Marks                               
55         2    27       Harsh     C
78         2    23       Clara     B
82         3    33        Tina     A
88         3    34         Amy     A
78         3    15    Prashant     B
55         3    27      Aditya     C
78         3    23  Radheshyam     B
50         3    11       Bobby     D
After dropping first row
The modified dataframe is:
       Class  Roll        Name Grade
Marks                               
78         2    23       Clara     B
82         3    33        Tina     A
88         3    34         Amy     A
78         3    15    Prashant     B
78         3    23  Radheshyam     B
50         3    11       Bobby     D

In this example, we have first use the dataframe index and the indexing operator to obtain the index label of the row at first position i.e. index 55. Then, we have passed the index label to the index parameter in the drop() method.

In the output, you can observe that more than one row has been dropped from the dataframe. This is due to the reason that the drop() method drops the rows by index labels. Hence, all the rows that have the same index as the first row are dropped from the input dataframe.

Drop the Last Row From a Pandas Dataframe

To drop the last row from the dataframe, we will first obtain the total number of rows in the dataframe using the len() function. The len() function takes the dataframe as its input argument and returns the total number of rows in the dataframe.

After obtaining the total number of rows, we will obtain the index label of the last row using the index attribute. After this, we will pass the index label to the labels parameter in the drop() method to drop the last row of the dataframe as shown below.

import pandas as pd
df=pd.read_csv("grade2.csv",index_col="Marks")
print("The dataframe is:")
print(df)
total_rows=len(df)
position=total_rows-1
print("After dropping last row")
idx=df.index[position]
df=df.drop(index=idx)
print("The modified dataframe is:")
print(df)

Output:

The dataframe is:
       Class  Roll        Name Grade
Marks                               
55         2    27       Harsh     C
78         2    23       Clara     B
82         3    33        Tina     A
88         3    34         Amy     A
78         3    15    Prashant     B
55         3    27      Aditya     C
78         3    23  Radheshyam     B
50         3    11       Bobby     D
After dropping last row
The modified dataframe is:
       Class  Roll        Name Grade
Marks                               
55         2    27       Harsh     C
78         2    23       Clara     B
82         3    33        Tina     A
88         3    34         Amy     A
78         3    15    Prashant     B
55         3    27      Aditya     C
78         3    23  Radheshyam     B

In this example, we have dropped the last row from the input dataframe. Again, if the input dataframe contains rows that have the same index as the last row, all such rows will also be deleted.

Drop rows inplace in a dataframe

In the examples given in the previous sections, you can observe that the original dataframe isn’t modified after deleting rows from it. Instead, a new dataframe is created and returned by the drop() method. If you want to modify the existing dataframe instead of creating a new one, you can set the inplace parameter to True in the drop() method as shown below.

import pandas as pd
df=pd.read_csv("grade2.csv",index_col="Marks")
print("The dataframe is:")
print(df)
total_rows=len(df)
position=total_rows-1
print("After dropping last row")
idx=df.index[position]
df.drop(index=idx,inplace=True)
print("The modified dataframe is:")
print(df)

Output:

The dataframe is:
       Class  Roll        Name Grade
Marks                               
55         2    27       Harsh     C
78         2    23       Clara     B
82         3    33        Tina     A
88         3    34         Amy     A
78         3    15    Prashant     B
55         3    27      Aditya     C
78         3    23  Radheshyam     B
50         3    11       Bobby     D
After dropping last row
The modified dataframe is:
       Class  Roll        Name Grade
Marks                               
55         2    27       Harsh     C
78         2    23       Clara     B
82         3    33        Tina     A
88         3    34         Amy     A
78         3    15    Prashant     B
55         3    27      Aditya     C
78         3    23  Radheshyam     B

In this example, we have set the inplace parameter to True in the drop() method. Hence, the input dataframe is modified instead of creating a new dataframe. In this case, the drop() method returns None.

Drop rows if index exists in a pandas dataframe

If the index label passed to the drop() method doesn’t exist in the dataframe, the drop() method runs into a python KeyError exception as shown below.

import pandas as pd
df=pd.read_csv("grade2.csv",index_col="Marks")
print("The dataframe is:")
print(df)
print("After dropping row at index 1117")
df.drop(index=1117,inplace=True)
print("The modified dataframe is:")
print(df)

Output:

KeyError: '[1117] not found in axis'

In the above example, we have tried to drop a column with index 1117 from the input dataframe. The index 1117 is not present in the input dataframe. Hence, the drop() method runs into a KeyError exception.

By default, the drop() method raises the KeyError exception if the index label passed to the labels or the index parameter doesn’t exist in the dataframe. To suppress the exception when the index doesn’t exist and drop rows if the index exists, you can set the errors parameter to “ignore” as shown below.

import pandas as pd
df=pd.read_csv("grade2.csv",index_col="Marks")
print("The dataframe is:")
print(df)
print("After dropping row at index 1117")
df.drop(index=1117,inplace=True,errors="ignore")
print("The modified dataframe is:")
print(df)

Output:

The dataframe is:
       Class  Roll        Name Grade
Marks                               
55         2    27       Harsh     C
78         2    23       Clara     B
82         3    33        Tina     A
88         3    34         Amy     A
78         3    15    Prashant     B
55         3    27      Aditya     C
78         3    23  Radheshyam     B
50         3    11       Bobby     D
After dropping row at index 1117
The modified dataframe is:
       Class  Roll        Name Grade
Marks                               
55         2    27       Harsh     C
78         2    23       Clara     B
82         3    33        Tina     A
88         3    34         Amy     A
78         3    15    Prashant     B
55         3    27      Aditya     C
78         3    23  Radheshyam     B
50         3    11       Bobby     D

In this example, we have suppressed the exception by setting the errors parameter to “ignore” in the drop() method. Hence, when the index label passed to the labels or the index parameter is doesn’t exist in the input dataframe, the drop() method has no effect on the input dataframe.

Drop multiple rows by index labels in a pandas dataframe

To drop multiple rows by index labels in a pandas dataframe, you can pass the list containing index labels to the drop() method as shown below.

import pandas as pd
df=pd.read_csv("grade2.csv",index_col="Marks")
print("The dataframe is:")
print(df)
indices=[55,88]
print("After dropping rows at indices 55,88")
df.drop(index=indices,inplace=True,errors="ignore")
print("The modified dataframe is:")
print(df)

Output:

The dataframe is:
       Class  Roll        Name Grade
Marks                               
55         2    27       Harsh     C
78         2    23       Clara     B
82         3    33        Tina     A
88         3    34         Amy     A
78         3    15    Prashant     B
55         3    27      Aditya     C
78         3    23  Radheshyam     B
50         3    11       Bobby     D
After dropping rows at indices 55,88
The modified dataframe is:
       Class  Roll        Name Grade
Marks                               
78         2    23       Clara     B
82         3    33        Tina     A
78         3    15    Prashant     B
78         3    23  Radheshyam     B
50         3    11       Bobby     D

In the above example, we have passed the list [55, 88] to the index parameter in the drop() method. Hence, all the rows with index 55 and 88 are dropped from the input dataframe.

Suggested Reading: If you are into machine learning, you can read this MLFlow tutorial with code examples. You might also like this article on 15 Free Data Visualization Tools for 2023.

Drop multiple rows by position from a pandas dataframe

To drop multiple rows by position from a dataframe, we will first find the index label of all the rows present at the positions that we want to drop using python indexing and the index attribute. Then, we will pass the list of index labels to the labels parameter in the drop() method as shown below.

import pandas as pd
df=pd.read_csv("grade2.csv",index_col="Marks")
print("The dataframe is:")
print(df)
positions=[3,4,5]
indices=[df.index[i-1] for i in positions]
print("After dropping rows at positions 3,4,5")
df.drop(index=indices,inplace=True,errors="ignore")
print("The modified dataframe is:")
print(df)

Output:

The dataframe is:
       Class  Roll        Name Grade
Marks                               
55         2    27       Harsh     C
78         2    23       Clara     B
82         3    33        Tina     A
88         3    34         Amy     A
78         3    15    Prashant     B
55         3    27      Aditya     C
78         3    23  Radheshyam     B
50         3    11       Bobby     D
After dropping rows at positions 3,4,5
The modified dataframe is:
       Class  Roll    Name Grade
Marks                           
55         2    27   Harsh     C
55         3    27  Aditya     C
50         3    11   Bobby     D

In the above example, we have deleted the rows at positions 3, 4, and 5. For this, we have used list comprehension and indexing to obtain the index labels at the specified positions. Then, we passed the list of indices to the index parameter in the drop() method to drop the rows by position in the pandas dataframe.

Drop the first n rows in a pandas dataframe

To drop the first n rows of the dataframe, we will first find the index labels of the first n rows using the index attribute of the dataframe. Then, we will pass the index labels to the drop() method as shown below.

import pandas as pd
df=pd.read_csv("grade2.csv",index_col="Marks")
print("The dataframe is:")
print(df)
n=3
indices=[df.index[i] for i in range(n)]
print("After dropping first 3 rows")
df.drop(index=indices,inplace=True,errors="ignore")
print("The modified dataframe is:")
print(df)

Output:

The dataframe is:
       Class  Roll        Name Grade
Marks                               
55         2    27       Harsh     C
78         2    23       Clara     B
82         3    33        Tina     A
88         3    34         Amy     A
78         3    15    Prashant     B
55         3    27      Aditya     C
78         3    23  Radheshyam     B
50         3    11       Bobby     D
After dropping first 3 rows
The modified dataframe is:
       Class  Roll    Name Grade
Marks                           
88         3    34     Amy     A
50         3    11   Bobby     D

Drop the last n rows of a dataframe

To drop the last n rows of a dataframe, we will first find the total number of rows in the dataframe using the len() function. Then, we will find the index labels of the last n rows using the index attribute and list indexing. After obtaining the index labels, we will pass them to the labels parameter in the drop() method to drop the rows as shown below.

import pandas as pd
df=pd.read_csv("grade2.csv",index_col="Marks")
print("The dataframe is:")
print(df)
total_rows=len(df)
n=3
indices=[df.index[i] for i in range(total_rows-n,total_rows)]
print("After dropping last 3 rows")
df.drop(index=indices,inplace=True,errors="ignore")
print("The modified dataframe is:")
print(df)

Output:

The dataframe is:
       Class  Roll        Name Grade
Marks                               
55         2    27       Harsh     C
78         2    23       Clara     B
82         3    33        Tina     A
88         3    34         Amy     A
78         3    15    Prashant     B
55         3    27      Aditya     C
78         3    23  Radheshyam     B
50         3    11       Bobby     D
After dropping last 3 rows
The modified dataframe is:
       Class  Roll  Name Grade
Marks                         
82         3    33  Tina     A
88         3    34   Amy     A

Conclusion

In this article, we have discussed different ways to drop rows from a pandas dataframe. To know more about the pandas module, you can read this article on how to sort a pandas dataframe. You might also like this article on how to drop columns from a pandas dataframe.

The post Drop Rows From Pandas Dataframe appeared first on PythonForBeginners.com.


Viewing all articles
Browse latest Browse all 22907

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>