Quantcast
Channel: Planet Python
Viewing all articles
Browse latest Browse all 22882

Python for Beginners: Drop Elements From a Series in Python

$
0
0

Pandas series is very useful for handling data having ordered key-value pairs. In this article, we will discuss different ways to drop elements from a pandas series.

Drop Elements From a Pandas Series Using the drop() Method

We can drop elements from a pandas series using the drop() method. It has the following syntax.

Series.drop(labels=None, *, axis=0, index=None, columns=None, level=None, inplace=False, errors='raise')

Here, 

  • The labels parameter takes the index of the elements that we need to delete from the series. You can pass a single index label or a list of indices to the labels parameter. 
  • The axis parameter is used to decide if we want to delete a row or column. For a pandas series, the axis parameter isn’t used. It is defined in the function just to ensure the compatibility of the drop() method with pandas dataframes.
  • The index parameter is used to select the index of the elements to delete for given labels in the dataframe. The index parameter is redundant for series objects. However, you can use the index parameter instead of the labels parameter. 
  • The columns parameter is used to select the columns to delete in a dataframe. The “columns” parameter is also redundant here. You can use labels or index parameters to drop elements from a series. 
  • The levels parameter is used to delete elements from a series when the series contains a multi-index. The levels parameter takes the level or list of levels from which the elements need to be deleted for specified labels. 
  • By default, the drop() method returns a new series object after deleting elements from the original series. In this process, the original series isn’t modified. If you want to modify the original series instead of creating a new series, you can set the inplace parameter to True.
  • The drop() method raises an exception whenever it runs into an error while dropping the elements from the series. For example, if an index or label that we want to delete doesn’t exist in the series, the drop() method raises a python KeyError exception. To suppress such errors while deleting an element from the series, you can set the errors parameter to “ignore”.

After execution, the drop() method returns a new series if the inplace parameter is set to False. Otherwise, it returns None. 

Drop a Single Element From a Pandas Series

To drop a single element from a series, you can pass the index of the element to the labels parameter in the drop() method as shown below.

import pandas as pd
import numpy as np
letters=["a","b","c","ab","abc","abcd","bc","d"]
numbers=[3,23,11,14,16,2,45,65]
series=pd.Series(letters)
series.index=numbers
print("The original series is:")
print(series)
series=series.drop(labels=11)
print("The modified series is:")
print(series)

Output:

The original series is:
3        a
23       b
11       c
14      ab
16     abc
2     abcd
45      bc
65       d
dtype: object
The modified series is:
3        a
23       b
14      ab
16     abc
2     abcd
45      bc
65       d
dtype: object

In the above example, we first created a Series object using the Series() constructor. Then we dropped the element having index 11 using the drop() method. For this, we have passed the value 11 to the drop() method. After execution of the drop() method, you can observe that the element with index 11 has been removed from the output series.

Instead of the labels parameter, you can also use the index parameter in the drop() method as shown below.

import pandas as pd
import numpy as np
letters=["a","b","c","ab","abc","abcd","bc","d"]
numbers=[3,23,11,14,16,2,45,65]
series=pd.Series(letters)
series.index=numbers
print("The original series is:")
print(series)
series=series.drop(index=11)
print("The modified series is:")
print(series)

Output:

The original series is:
3        a
23       b
11       c
14      ab
16     abc
2     abcd
45      bc
65       d
dtype: object
The modified series is:
3        a
23       b
14      ab
16     abc
2     abcd
45      bc
65       d
dtype: object

In this example, we have used the index parameter instead of the labels parameter. However, the resultant series after execution of the drop() method is the same in both cases.

Delete Multiple Elements From a Pandas Series

To drop multiple elements from a series, you can pass a python list of indices of the elements to be deleted to the labels parameter. For instance, if you want to delete elements at indices 11, 16, and 2 of the given Series, you can pass the list [11,16,2] to the labels parameter in the drop() method as shown below.

import pandas as pd
import numpy as np
letters=["a","b","c","ab","abc","abcd","bc","d"]
numbers=[3,23,11,14,16,2,45,65]
series=pd.Series(letters)
series.index=numbers
print("The original series is:")
print(series)
series=series.drop(labels=[11,16,2])
print("The modified series is:")
print(series)

Output:

The original series is:
3        a
23       b
11       c
14      ab
16     abc
2     abcd
45      bc
65       d
dtype: object
The modified series is:
3      a
23     b
14    ab
45    bc
65     d
dtype: object

In this example, we have passed the list [11, 16, 2] as input to the labels parameter. Hence, after execution of the drop() method, the elements at index 11, 16, and 2 are deleted from the original series object.

Instead of the labels parameter, you can pass the list of indices to the index parameter as shown below.

import pandas as pd
import numpy as np
letters=["a","b","c","ab","abc","abcd","bc","d"]
numbers=[3,23,11,14,16,2,45,65]
series=pd.Series(letters)
series.index=numbers
print("The original series is:")
print(series)
series=series.drop(index=[11,16,2])
print("The modified series is:")
print(series)

Output:

The original series is:
3        a
23       b
11       c
14      ab
16     abc
2     abcd
45      bc
65       d
dtype: object
The modified series is:
3      a
23     b
14    ab
45    bc
65     d
dtype: object

Drop Elements Inplace From a Pandas Series

By default, the drop() method returns a new series and doesn’t delete specified elements from the original series. To drop elements inplace from a pandas series, you can set the inplace parameter to True as shown below.

import pandas as pd
import numpy as np
letters=["a","b","c","ab","abc","abcd","bc","d"]
numbers=[3,23,11,14,16,2,45,65]
series=pd.Series(letters)
series.index=numbers
print("The original series is:")
print(series)
series.drop(index=[11,16,2],inplace=True)
print("The modified series is:")
print(series)

Output:

The original series is:
3        a
23       b
11       c
14      ab
16     abc
2     abcd
45      bc
65       d
dtype: object
The modified series is:
3      a
23     b
14    ab
45    bc
65     d
dtype: object

In all the previous examples, the drop() method returned a new Series object. In this example, we have set the inplace parameter to True in the drop() method. Hence, the elements are deleted from the original series and it is modified. In this case, the drop() method returns None.

Delete an Element From a Series if the Index Exists

While deleting elements from a series using the drop() method, it is possible that we might pass an index to the labels or index parameter that is not present in the Series object. If the value passed to the labels or index parameter isn’t present in the Series, the drop() method runs into a KeyError exception as shown below.

import pandas as pd
import numpy as np
letters=["a","b","c","ab","abc","abcd","bc","d"]
numbers=[3,23,11,14,16,2,45,65]
series=pd.Series(letters)
series.index=numbers
print("The original series is:")
print(series)
series.drop(index=1117,inplace=True)
print("The modified series is:")
print(series)

Output:

KeyError: '[1117] not found in axis'

In the above example, we have passed the value 1117 to the index parameter. As the value 1117 is not present in the Series, we get a KeyError exception.

To avoid errors and drop elements from a series if the index exists, you can use the errors parameter. By default, the errors parameter is set to "raise". Due to this, the drop() method raises an exception every time it runs into an error. To suppress the exception, you can set the errors parameter to “ignore” as shown in the following example.

import pandas as pd
import numpy as np
letters=["a","b","c","ab","abc","abcd","bc","d"]
numbers=[3,23,11,14,16,2,45,65]
series=pd.Series(letters)
series.index=numbers
print("The original series is:")
print(series)
series.drop(index=1117,inplace=True,errors="ignore")
print("The modified series is:")
print(series)

Output:

The original series is:
3        a
23       b
11       c
14      ab
16     abc
2     abcd
45      bc
65       d
dtype: object
The modified series is:
3        a
23       b
11       c
14      ab
16     abc
2     abcd
45      bc
65       d
dtype: object

In the above example, we have passed the value 1117 to the index parameter. As 1117 is not present in the series index, the drop() method would have run into a KeyError exception. However, we have set the errors parameter to"ignore" in the drop() method. Hence, it suppresses the error. You can also observe that the series returned by the drop() method is the same as the original series.

Suggested Reading: If you are into machine learning, you can read this article on regression in machine learning. You might also like this article on clustering mixed data types in Python.

Drop NaN Values From a Pandas Series

NaN values are special numbers having floating-point data type in Python. NaN values are used to represent the absence of a value. Most of the times, NaN values have no importance in a given dataset and we need to remove these values.

You can drop NaN values from a pandas series using the dropna() method. It has the following syntax.

Series.dropna(*, axis=0, inplace=False, how=None)

Here,  

  • The axis parameter is used to decide if we want to delete nan values from a row or column from the series. For a pandas series, the axis parameter isn’t used. It is defined just to ensure the compatibility of the dropna() method with pandas dataframes.
  • By default, the dropna() method returns a new series object after deleting nan values from the original series. In this process, the original series isn’t modified. If you want to delete the nan values from the original series instead of creating a new series, you can set the inplace parameter to True.
  • The “how” parameter is not used for a series. 

After execution, the dropna() method returns a new series if the inplace parameter is set to False. Otherwise, it returns None. 

You can drop nan values from a pandas series as shown in the following example.

import pandas as pd
import numpy as np
letters=["a","b","c",np.nan,"ab","abc",np.nan,"abcd","bc","d"]
series=pd.Series(letters)
print("The original series is:")
print(series)
series=series.dropna()
print("The modified series is:")
print(series)

Output:

The original series is:
0       a
1       b
2       c
3     NaN
4      ab
5     abc
6     NaN
7    abcd
8      bc
9       d
dtype: object
The modified series is:
0       a
1       b
2       c
4      ab
5     abc
7    abcd
8      bc
9       d

In the above example, you can observe that the original series has two NaN values. After execution, the dropna() method deletes both the NaN values with their indices and returns a new series.

Drop NaN Values Inplace From a Pandas Series

If you want to drop NaN values from the original series instead of creating a new series, you can set the inplace parameter to True in the dropna() method as shown below.

import pandas as pd
import numpy as np
letters=["a","b","c",np.nan,"ab","abc",np.nan,"abcd","bc","d"]
series=pd.Series(letters)
print("The original series is:")
print(series)
series.dropna(inplace=True)
print("The modified series is:")
print(series)

Output:

import pandas as pd
import numpy as np
letters=["a","b","c",np.nan,"ab","abc",np.nan,"abcd","bc","d"]
series=pd.Series(letters)
print("The original series is:")
print(series)
series.dropna(inplace=True)
print("The modified series is:")
print(series)

Here, we have set the inplace parameter to True. Hence, the dropna() method modified the original series instead of creating a new one. In this case, the dropna() method returns None after execution.

Drop Duplicates From a Pandas Series

We data preprocessing, we often need to remove duplicate values from the given data. To drop duplicate values from a pandas series, you can use the drop_duplicates() method. It has the following syntax.

Series.drop_duplicates(*, keep='first', inplace=False)

Here,

  • The keep parameter is used to decide what values we need to keep after removing the duplicates. To drop all the duplicate values except for the first occurrence, you can set the keep parameter to “first” which is its default value. To drop all the duplicate values except for the last occurrence, you can set the keep parameter to “last”. If you want to drop all the duplicate values, you can set the keep parameter to False.
  • By default, the drop_duplicates() method returns a new series object after deleting duplicate values from the original series. In this process, the original series isn’t modified. If you want to delete the duplicate values from the original series instead of creating a new series, you can set the inplace parameter to True.

After execution, the drop_duplicates() method returns a new series if the inplace parameter is set to False. Otherwise, it returns None. You can observe this in the following example.

import pandas as pd
import numpy as np
letters=["a","b","a","a","ab","abc","ab","abcd","bc","abc","ab"]
series=pd.Series(letters)
print("The original series is:")
print(series)
series=series.drop_duplicates()
print("The modified series is:")
print(series)

Output:

The original series is:
0        a
1        b
2        a
3        a
4       ab
5      abc
6       ab
7     abcd
8       bc
9      abc
10      ab
dtype: object
The modified series is:
0       a
1       b
4      ab
5     abc
7    abcd
8      bc
dtype: object

In the above example, you can observe that strings “a”, “ab”, and “abc” are present multiple times in the series. Hence, when we invoke the drop_duplicates() method on the series objects, all the duplicates except the one occurrence of the strings are removed from the series.

Looking at the indices, you can observe that first occurrence of the elements have been retained if the elements are present multiple times in the series. If you want to preserve the last occurrence of the elements having duplicate values, you can set the keep parameter to"last" as shown below.

import pandas as pd
import numpy as np
letters=["a","b","a","a","ab","abc","ab","abcd","bc","abc","ab"]
series=pd.Series(letters)
print("The original series is:")
print(series)
series=series.drop_duplicates(keep="last")
print("The modified series is:")
print(series)

Output:

The original series is:
0        a
1        b
2        a
3        a
4       ab
5      abc
6       ab
7     abcd
8       bc
9      abc
10      ab
dtype: object
The modified series is:
1        b
3        a
7     abcd
8       bc
9      abc
10      ab
dtype: object

In the above example, we have set the keep parameter to "last". Hence, you can observe that the drop_duplicates() method preserves the last occurrence of the elements that have duplicate values.

Drop Duplicates Inplace in a Pandas Series

By default, the drop_duplicates() method doesn’t modify the original series object. It returns a new series. If you want to modify the original series by deleting the duplicates, you can set the inplace parameter to True in the drop_duplicates() method as shown below.

import pandas as pd
import numpy as np
letters=["a","b","a","a","ab","abc","ab","abcd","bc","abc","ab"]
series=pd.Series(letters)
print("The original series is:")
print(series)
series.drop_duplicates(inplace=True)
print("The modified series is:")
print(series)

Output:

The original series is:
0        a
1        b
2        a
3        a
4       ab
5      abc
6       ab
7     abcd
8       bc
9      abc
10      ab
dtype: object
The modified series is:
0       a
1       b
4      ab
5     abc
7    abcd
8      bc
dtype: object

In this example, we have set the inplace parameter to True. Hence, the drop_duplicates() method modified the original series instead of creating a new one. In this case, the drop_duplicates() method returns None after execution.

Drop All Duplicate Values From a Pandas Series

To drop all the duplicates from a pandas series, you can set the keep parameter to False as shown below.

import pandas as pd
import numpy as np
letters=["a","b","a","a","ab","abc","ab","abcd","bc","abc","ab"]
series=pd.Series(letters)
print("The original series is:")
print(series)
series=series.drop_duplicates(keep=False)
print("The modified series is:")
print(series)

Output:

The original series is:
0        a
1        b
2        a
3        a
4       ab
5      abc
6       ab
7     abcd
8       bc
9      abc
10      ab
dtype: object
The modified series is:
1       b
7    abcd
8      bc
dtype: object

In this example, we have set the keep parameter to False in the drop_duplicates() method. Hence, you can observe that all the elements having duplicate values are removed from the series.

Conclusion

In this article, we have discussed different ways to drop elements from a pandas series. To know more about pandas module, you can read this article on how to sort a pandas dataframe. You might also like this article on how to drop columns from a pandas dataframe.

The post Drop Elements From a Series in Python appeared first on PythonForBeginners.com.


Viewing all articles
Browse latest Browse all 22882

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>