ListenData: How to drop one or more columns in Pandas Dataframe

In this tutorial, we will cover how to drop or remove one or multiple columns from pandas dataframe.

What is pandas in Python?

pandas is a python package for data manipulation. It has several functions for the following data tasks:

Drop or Keep rows and columns
Aggregate data by one or more columns
Sort or reorder data
Merge or append multiple dataframes
String Functions to handle text data
DateTime Functions to handle date or time format columns

Import or Load Pandas library

To make use of any python library, we first need to load them up by using import command.

import pandas as pd
import numpy as np

Let's create a fake dataframe for illustration

The code below creates 4 columns named A through D.

df = pd.DataFrame(np.random.randn(6, 4), columns=list('ABCD'))

          A         B         C         D
0 -1.236438 -1.656038  1.655995 -1.413243
1  0.507747  0.710933 -1.335381  0.832619
2  0.280036 -0.411327  0.098119  0.768447
3  0.858730 -0.093217  1.077528  0.196891
4 -0.905991  0.302687  0.125881 -0.665159
5 -2.012745 -0.692847 -1.463154 -0.707779

Drop a column in python

In pandas, drop( ) function is used to remove column(s).axis=1 tells Python that you want to apply function on columns instead of rows.

df.drop(['A'], axis=1)

Column A has been removed. See the output shown below.

          B         C         D
0 -1.656038  1.655995 -1.413243
1  0.710933 -1.335381  0.832619
2 -0.411327  0.098119  0.768447
3 -0.093217  1.077528  0.196891
4  0.302687  0.125881 -0.665159
5 -0.692847 -1.463154 -0.707779

In order to create a new dataframenewdf storing remaining columns, you can use the command below.

newdf = df.drop(['A'], axis=1)

To delete the column permanently from original dataframe df, you can use the option inplace=True

df.drop(['A'], axis=1, inplace=True)

#Check columns in df after dropping column A
df.columns

Output
Index(['B', 'C', 'D'], dtype='object')

The parameter inplace= can be deprecated (removed) in future which means you might not see it working in the upcoming release of pandas package. You should avoid using this parameter if you are not already habitual of using it. Instead you can store your data after removing columns in a new dataframe (as explained in the above section).

If you want to change the existing dataframe, try this df = df.drop(['A'], axis=1)

Remove Multiple Columns in Python

You can specify all the columns you want to remove in a list and pass it in drop( ) function.

Method I

df2 = df.drop(['B','C'], axis=1)

Method II

cols = ['B','C']
df2 = df.drop(cols, axis=1)

Select or Keep Columns

If you wish to select a column (instead of drop), you can use the command

df['A']

To select multiple columns, you can submit the following code.

df[['A','B']]

How to drop column by position number from pandas Dataframe?

You can find out name of first column by using this command df.columns[0]. Indexing in python starts from 0.

df.drop(df.columns[0], axis =1)

To drop multiple columns by position (first and third columns), you can specify the position in list [0,2].

cols = [0,2]
df.drop(df.columns[cols], axis =1)

Drop columns by name pattern

df = pd.DataFrame({"X1":range(1,6),"X_2":range(2,7),"YX":range(3,8),"Y_1":range(2,7),"Z":range(5,10)})

   X1  X_2  YX  Y_1  Z
0   1    2   3    2  5
1   2    3   4    3  6
2   3    4   5    4  7
3   4    5   6    5  8
4   5    6   7    6  9

Drop column whose name starts with letter 'X'

df.loc[:,~df.columns.str.contains('^X')]

How it works?

^X is a expression of regex language which refers to beginning of letter 'X'
df.columns.str.contains('^X') returns array [True, True, False, False, False].
True where condition meets. Otherwise False
Sign ~ refers to negate the condition.
df.loc[ ] is used to select columns

It can also be written like :

df.drop(df.columns[df.columns.str.contains('^X')], axis=1)

Other Examples

#Removing columns whose name contains string 'X'
df.loc[:,~df.columns.str.contains('X')]

#Removing columns whose name contains string either 'X' or 'Y'
df.loc[:,~df.columns.str.contains('X|Y')]

#Removing columns whose name ends with string 'X'
df.loc[:,~df.columns.str.contains('X$')]

Drop columns where percentage of missing values is greater than 50%

df = pd.DataFrame({'A':[1,3,np.nan,5,np.nan],
                   'B':[4,np.nan,np.nan,5,np.nan]
                   })

% of missing values can be calculated by mean of NAs in each column.

cols = df.columns[df.isnull().mean()>0.5]
df.drop(cols, axis=1)

ListenData: How to drop one or more columns in Pandas Dataframe

Drop a column in python

Remove Multiple Columns in Python

How to drop column by position number from pandas Dataframe?

Drop columns by name pattern

Drop column whose name starts with letter 'X'

Drop columns where percentage of missing values is greater than 50%

Trending Articles

Practice Sheet of Right form of verbs for HSC Students

The Ultimate Doors Discography - 90 Albums - All MP3's

Black Angus Grilled Artichokes

How to set Page Break/ Page Reset in Smartforms using Events

Mp3 Download: Mdu - Mazola

VMOU RSCIT Result 2017, RSCIT Result VMOU rkcl.vmou.ac.in Name Wise

Best service Chris Hein Chromatic Harmonica v.1.0.KONTAKT-MAGNETRiXX

Bureau of Internal Revenue: Regional Offices (Directory)

Queen & Megan Thee Stallion – We Will Rock You (Megan Thee Stallion Version)...

The Crack Era In Philadelphia Revisited: Philly Junior Black Mafia Timeline...

Windows Update / Microsoft Update の接続先 URL について

Griffith faces three more offences

Isle of Man property sales, October 23, 2014

Motrex mtxm100ja

JORGE ANDRADE Arrested by Miami-Dade County Corrections on Oct 24, 2016

Read GOS (Generic Object Service) Picture Attachments and Display it into...

‘I did it for my girl’

Halestorm – Everest – Pre-Single [iTunes Plus M4A]

Hucknall burglar's attempt to break into woman's home foiled

WALLACE; JACQUELINE