Quantcast
Channel: Planet Python
Viewing all articles
Browse latest Browse all 22880

Python for Beginners: Check if a Column Is Sorted in a Pandas Dataframe

$
0
0

Pandas dataframe is a great tool for handling tabular data in python. In this article, we will discuss different ways to check if a column is sorted in a pandas dataframe. 

Check if a Column Is Sorted Using Column Attributes

To check if a column is sorted either in ascending order in a pandas dataframe, we can use the is_monotonic attribute of the column. The is_monotonic attribute evaluates to True if a column is sorted in ascending order i.e. if values in the column are monotonically increasing.

For instance, if a dataframe is sorted in ascending order, the is_monotonic attribute will evaluate to True as shown below.

import pandas as pd
df=pd.read_csv("grade2.csv")
df.sort_values(by="Marks",inplace=True)
print("The dataframe is:")
print(df)
temp=df["Marks"].is_monotonic
print("The 'Marks' column is sorted:",temp)

Output:

The dataframe is:
   Class  Roll        Name  Marks Grade
7      3    11       Bobby     50     D
0      2    27       Harsh     55     C
5      3    27      Aditya     55     C
1      2    23       Clara     78     B
4      3    15    Prashant     78     B
6      3    23  Radheshyam     78     B
2      3    33        Tina     82     A
3      3    34         Amy     88     A
The 'Marks' column is sorted: True

In the above example, we first loaded a CSV file into a dataframe using the read_csv() function. After that, we sorted the dataframe by the "Marks" column using the sort_values() method. After sorting, you can observe that the is_monotonic attribute of the column returns True. It denotes that the column is sorted in descending order.

If a column is sorted in descending order, the is_monotonic attribute will evaluate to False.

import pandas as pd
df=pd.read_csv("grade2.csv")
df.sort_values(by="Marks",inplace=True,ascending=False)
print("The dataframe is:")
print(df)
temp=df["Marks"].is_monotonic
print("The 'Marks' column is sorted:",temp)

Output:

The dataframe is:
   Class  Roll        Name  Marks Grade
3      3    34         Amy     88     A
2      3    33        Tina     82     A
1      2    23       Clara     78     B
4      3    15    Prashant     78     B
6      3    23  Radheshyam     78     B
0      2    27       Harsh     55     C
5      3    27      Aditya     55     C
7      3    11       Bobby     50     D
The 'Marks' column is sorted: False

In this example, we have sorted the "Marks" column in descending order. Due to this, the is_monotonic attribute evaluates to False.

If a column in the dataframe is not sorted, the is_monotonic attribute will evaluate to False. You can observe this in the following example.

import pandas as pd
df=pd.read_csv("grade2.csv")
print("The dataframe is:")
print(df)
temp=df["Marks"].is_monotonic
print("The 'Marks' column is sorted:",temp)

Output:

The dataframe is:
   Class  Roll        Name  Marks Grade
0      2    27       Harsh     55     C
1      2    23       Clara     78     B
2      3    33        Tina     82     A
3      3    34         Amy     88     A
4      3    15    Prashant     78     B
5      3    27      Aditya     55     C
6      3    23  Radheshyam     78     B
7      3    11       Bobby     50     D
The 'Marks' column is sorted: False

Here, you can observe that we have accessed the is_monotonic attribute without sorting the dataframe by the "Marks"column. Hence, the "Marks"column is unsorted and the is_monotonic attribute evaluates to False.

The is_monotonic doesn’t work with NaN values. If a column contains NaN values, the is_monotonic attribute always evaluates to False. You can observe this in the following example.

import pandas as pd
df=pd.read_csv("grade.csv")
df.sort_values(by="Marks",inplace=True,ascending=True)
print("The dataframe is:")
print(df)
temp=df["Marks"].is_monotonic
print("The 'Marks' column is sorted:",temp)

Output:

The dataframe is:
    Class  Roll        Name  Marks Grade
6       2    27       Harsh   55.0     C
10      3    27      Aditya   55.0     C
4       2    22         Tom   73.0     B
2       1    14         Sam   75.0     B
5       2    15        Golu   79.0     B
0       1    11      Aditya   85.0     A
8       3    34         Amy   88.0     A
1       1    12       Chris    NaN     A
3       1    15       Harry    NaN   NaN
7       2    23       Clara    NaN     B
9       3    15    Prashant    NaN     B
11      3    23  Radheshyam    NaN   NaN
The 'Marks' column is sorted: False

In this example, you can observe that the "Marks" column contains NaN values. Due to this, even after sorting, the is_monotonic attribute evaluates to False. You may argue that the NaN values are at the last of the column. Maybe, this is why the is_monotonic attribute evaluates to False.

However, if we put the rows having NaN values at the top of the dataframe, the is_monotonic attribute will again evaluate to False. You can observe this in the following example.

import pandas as pd
df=pd.read_csv("grade.csv")
df.sort_values(by="Marks",inplace=True,ascending=True,na_position="first")
print("The dataframe is:")
print(df)
temp=df["Marks"].is_monotonic
print("The 'Marks' column is sorted:",temp)

Output:

The dataframe is:
    Class  Roll        Name  Marks Grade
1       1    12       Chris    NaN     A
3       1    15       Harry    NaN   NaN
7       2    23       Clara    NaN     B
9       3    15    Prashant    NaN     B
11      3    23  Radheshyam    NaN   NaN
6       2    27       Harsh   55.0     C
10      3    27      Aditya   55.0     C
4       2    22         Tom   73.0     B
2       1    14         Sam   75.0     B
5       2    15        Golu   79.0     B
0       1    11      Aditya   85.0     A
8       3    34         Amy   88.0     A
The 'Marks' column is sorted: False

In this example, we have put the NaN values at the start of the sorted "Marks" column. Even after this, the is_monotonic attribute evaluates to False. Thus, we can conclude that the is_monotonic attribute cannot be used with columns having NaN values.

While using the is_monotonic attribute, you will get a FutureWarning with the message “FutureWarning: is_monotonic is deprecated and will be removed in a future version. Use is_monotonic_increasing instead.” So, the is_monotonic attribute will be deprecated in future pandas versions. As an alternative, we can use the is_monotonic_increasing and is_monotonic_decreasing attributes to check if a column is sorted in a pandas dataframe.

Check if a Column Is Sorted in Ascending Order in a Dataframe

To check if a column in a dataframe is sorted in ascending order, we can use the  is_monotonic_increasing attribute. The is_monotonic_increasing attribute evaluates to True if a column is sorted in ascending order. Otherwise, it is set to False. You can observe this in the following example.

import pandas as pd
df=pd.read_csv("grade2.csv")
df.sort_values(by="Marks",inplace=True,ascending=True)
print("The dataframe is:")
print(df)
temp=df["Marks"].is_monotonic_increasing
print("The 'Marks' column is sorted:",temp)

Output:

The dataframe is:
   Class  Roll        Name  Marks Grade
7      3    11       Bobby     50     D
0      2    27       Harsh     55     C
5      3    27      Aditya     55     C
1      2    23       Clara     78     B
4      3    15    Prashant     78     B
6      3    23  Radheshyam     78     B
2      3    33        Tina     82     A
3      3    34         Amy     88     A
The 'Marks' column is sorted: True

If a column is not sorted, the is_monotonic_increasing attribute evaluates to False.

import pandas as pd
df=pd.read_csv("grade2.csv")
print("The dataframe is:")
print(df)
temp=df["Marks"].is_monotonic_increasing
print("The 'Marks' column is sorted:",temp)

Output:

The dataframe is:
   Class  Roll        Name  Marks Grade
0      2    27       Harsh     55     C
1      2    23       Clara     78     B
2      3    33        Tina     82     A
3      3    34         Amy     88     A
4      3    15    Prashant     78     B
5      3    27      Aditya     55     C
6      3    23  Radheshyam     78     B
7      3    11       Bobby     50     D
The 'Marks' column is sorted: False

Also, if a column is sorted in descending order, the is_monotonic_increasing attribute evaluates to False.

import pandas as pd
df=pd.read_csv("grade2.csv")
df.sort_values(by="Marks",inplace=True,ascending=False)
print("The dataframe is:")
print(df)
temp=df["Marks"].is_monotonic_increasing
print("The 'Marks' column is sorted:",temp)

Output:

The dataframe is:
   Class  Roll        Name  Marks Grade
3      3    34         Amy     88     A
2      3    33        Tina     82     A
1      2    23       Clara     78     B
4      3    15    Prashant     78     B
6      3    23  Radheshyam     78     B
0      2    27       Harsh     55     C
5      3    27      Aditya     55     C
7      3    11       Bobby     50     D
The 'Marks' column is sorted: False

The is_monotonic_increasing attribute cannot be used with columns having NaN values. The is_monotonic_increasing attribute always evaluates to False if a column has NaN values. You can observe this in the following example. 

import pandas as pd
df=pd.read_csv("grade.csv")
df.sort_values(by="Marks",inplace=True,ascending=True,na_position="last")
print("The dataframe is:")
print(df)
temp=df["Marks"].is_monotonic_increasing
print("The 'Marks' column is sorted:",temp)

Output:

The dataframe is:
    Class  Roll        Name  Marks Grade
6       2    27       Harsh   55.0     C
10      3    27      Aditya   55.0     C
4       2    22         Tom   73.0     B
2       1    14         Sam   75.0     B
5       2    15        Golu   79.0     B
0       1    11      Aditya   85.0     A
8       3    34         Amy   88.0     A
1       1    12       Chris    NaN     A
3       1    15       Harry    NaN   NaN
7       2    23       Clara    NaN     B
9       3    15    Prashant    NaN     B
11      3    23  Radheshyam    NaN   NaN
The 'Marks' column is sorted: False

Even if we put the rows having NaN values at the top of the dataframe, the is_monotonic_increasing attribute will evaluate to False.

import pandas as pd
df=pd.read_csv("grade.csv")
df.sort_values(by="Marks",inplace=True,ascending=True,na_position="first")
print("The dataframe is:")
print(df)
temp=df["Marks"].is_monotonic_increasing
print("The 'Marks' column is sorted:",temp)

Output:

The dataframe is:
    Class  Roll        Name  Marks Grade
1       1    12       Chris    NaN     A
3       1    15       Harry    NaN   NaN
7       2    23       Clara    NaN     B
9       3    15    Prashant    NaN     B
11      3    23  Radheshyam    NaN   NaN
6       2    27       Harsh   55.0     C
10      3    27      Aditya   55.0     C
4       2    22         Tom   73.0     B
2       1    14         Sam   75.0     B
5       2    15        Golu   79.0     B
0       1    11      Aditya   85.0     A
8       3    34         Amy   88.0     A
The 'Marks' column is sorted: False

Check if a Column Is Sorted in Descending Order in a Pandas Dataframe

To check if a column is sorted in descending order in a pandas dataframe, we will use the is_monotonic_decreasing attribute. The is_monotonic_decreasing attribute evaluates to True if a column is sorted in descending order. You can observe this in the following example.

import pandas as pd
df=pd.read_csv("grade2.csv")
df.sort_values(by="Marks",inplace=True,ascending=False)
print("The dataframe is:")
print(df)
temp=df["Marks"].is_monotonic_decreasing
print("The 'Marks' column is sorted:",temp)

Output:

The dataframe is:
   Class  Roll        Name  Marks Grade
3      3    34         Amy     88     A
2      3    33        Tina     82     A
1      2    23       Clara     78     B
4      3    15    Prashant     78     B
6      3    23  Radheshyam     78     B
0      2    27       Harsh     55     C
5      3    27      Aditya     55     C
7      3    11       Bobby     50     D
The 'Marks' column is sorted: True

If a column is unsorted or is sorted in ascending order, the is_monotonic_decreasing attribute evaluates to False as shown below.

import pandas as pd
df=pd.read_csv("grade2.csv")
#df.sort_values(by="Marks",inplace=True,ascending=False)
print("The dataframe is:")
print(df)
temp=df["Marks"].is_monotonic_decreasing
print("The 'Marks' column is sorted:",temp)

Output:

The dataframe is:
   Class  Roll        Name  Marks Grade
0      2    27       Harsh     55     C
1      2    23       Clara     78     B
2      3    33        Tina     82     A
3      3    34         Amy     88     A
4      3    15    Prashant     78     B
5      3    27      Aditya     55     C
6      3    23  Radheshyam     78     B
7      3    11       Bobby     50     D
The 'Marks' column is sorted: False

The is_monotonic_decreasing cannot be used with columns having NaN values. The is_monotonic_decreasing attribute always evaluates to False if a column has NaN values. You can observe this in the following example. 

import pandas as pd
df=pd.read_csv("grade.csv")
df.sort_values(by="Marks",inplace=True,ascending=False,na_position="last")
print("The dataframe is:")
print(df)
temp=df["Marks"].is_monotonic_decreasing
print("The 'Marks' column is sorted:",temp)

Output:

The dataframe is:
    Class  Roll        Name  Marks Grade
8       3    34         Amy   88.0     A
0       1    11      Aditya   85.0     A
5       2    15        Golu   79.0     B
2       1    14         Sam   75.0     B
4       2    22         Tom   73.0     B
6       2    27       Harsh   55.0     C
10      3    27      Aditya   55.0     C
1       1    12       Chris    NaN     A
3       1    15       Harry    NaN   NaN
7       2    23       Clara    NaN     B
9       3    15    Prashant    NaN     B
11      3    23  Radheshyam    NaN   NaN
The 'Marks' column is sorted: False

Even if we put the rows having NaN values at the top of the dataframe, the is_monotonic_decreasing attribute will evaluate to False.

import pandas as pd
df=pd.read_csv("grade.csv")
df.sort_values(by="Marks",inplace=True,ascending=False,na_position="first")
print("The dataframe is:")
print(df)
temp=df["Marks"].is_monotonic_decreasing
print("The 'Marks' column is sorted:",temp)

Output:

The dataframe is:
    Class  Roll        Name  Marks Grade
1       1    12       Chris    NaN     A
3       1    15       Harry    NaN   NaN
7       2    23       Clara    NaN     B
9       3    15    Prashant    NaN     B
11      3    23  Radheshyam    NaN   NaN
8       3    34         Amy   88.0     A
0       1    11      Aditya   85.0     A
5       2    15        Golu   79.0     B
2       1    14         Sam   75.0     B
4       2    22         Tom   73.0     B
6       2    27       Harsh   55.0     C
10      3    27      Aditya   55.0     C
The 'Marks' column is sorted: False

Suggested Reading: If you are into machine learning, you can read this article on regression in machine learning. You might also like this article on clustering mixed data types in Python.

Check if a Column Is Sorted in a Dataframe Using the Numpy Module

The numpy module in python provides us with different functions to perform operations on numeric data. One such function is the diff() function. The diff() function takes an iterable object as its input argument and returns an array containing the first-order difference of the array elements as shown in the following example.

import numpy as np
df=pd.read_csv("grade2.csv")
marks=df["Marks"]
print("The Marks column is:")
print(marks)
temp=np.diff(marks)
print("Array returned by diff() is:")
print(temp)

Output:

The Marks column is:
0    55
1    78
2    82
3    88
4    78
5    55
6    78
7    50
Name: Marks, dtype: int64
Array returned by diff() is:
[ 23   4   6 -10 -23  23 -28]

Here, you can observe that the first-order difference is calculated as the difference between (n+1)th and nth element in the input array. For example, the first element of the output array is the difference between the second element and the first element of the input "Marks" column. The second element in the output array is the difference of the third element and the second element of the "Marks" column.

By observing the output, we can conclude “if the ‘Marks’ column is sorted in ascending order, all the values in the output array will be greater than or equal to 0. Similarly, if the ‘marks’ column is sorted in descending order, all the elements in the output array will be less than or equal to 0.” We will use this conclusion to check if the column is sorted in ascending or descending order.

To check if a column of a pandas dataframe is sorted in ascending order, we will use the following steps.

  • First, we will calculate the first-order difference of the specified column. For this, we will pass the column to the diff() function as an input argument.
  • After that, we will check if all the elements in the output array are less than or equal to 0. For this, we will use the comparison operator and the all() method. When we use the comparison operator on a numpy array, we get an array of boolean values. The all() method, when invoked on an array containing boolean values, returns True if all the elements are True.
  • If the all() method returns True, it will conclude that all the elements are sorted in ascending order.

You can observe this in the following example.

import numpy as np
df=pd.read_csv("grade2.csv")
df.sort_values(by="Marks",inplace=True,ascending=True)
marks=df["Marks"]
print("The dataframe is:")
print(df)
temp=np.diff(marks)
print("Array returned by diff() is:")
print(temp)
boolean_array= temp>=0
print("Boolean array is:")
print(boolean_array)
result=boolean_array.all()
if result:
    print("The marks column is sorted.")
else:
    print("The marks column is not sorted.")

Output:

The dataframe is:
   Class  Roll        Name  Marks Grade
7      3    11       Bobby     50     D
0      2    27       Harsh     55     C
5      3    27      Aditya     55     C
1      2    23       Clara     78     B
4      3    15    Prashant     78     B
6      3    23  Radheshyam     78     B
2      3    33        Tina     82     A
3      3    34         Amy     88     A
Array returned by diff() is:
[ 5  0 23  0  0  4  6]
Boolean array is:
[ True  True  True  True  True  True  True]
The marks column is sorted.

To check if a column is sorted in descending order, we will check if all the elements in the output array of the diff() function are less than or equal to 0. For this, we will use the comparison operator and the all() method. When we use the comparison operator on a numpy array, we get an array of boolean values. The all() method, when invoked on an array containing boolean values, returns True if all the elements are True.

If the all() method returns True, it will conclude that all the elements are sorted in descending order. You can observe this in the following example.

import numpy as np
df=pd.read_csv("grade2.csv")
df.sort_values(by="Marks",inplace=True,ascending=False)
marks=df["Marks"]
print("The dataframe is:")
print(df)
temp=np.diff(marks)
print("Array returned by diff() is:")
print(temp)
boolean_array= temp<=0
print("Boolean array is:")
print(boolean_array)
result=boolean_array.all()
if result:
    print("The marks column is sorted.")
else:
    print("The marks column is not sorted.")

Output:

The dataframe is:
   Class  Roll        Name  Marks Grade
3      3    34         Amy     88     A
2      3    33        Tina     82     A
1      2    23       Clara     78     B
4      3    15    Prashant     78     B
6      3    23  Radheshyam     78     B
0      2    27       Harsh     55     C
5      3    27      Aditya     55     C
7      3    11       Bobby     50     D
Array returned by diff() is:
[ -6  -4   0   0 -23   0  -5]
Boolean array is:
[ True  True  True  True  True  True  True]
The marks column is sorted.

Check if the Index Column Is Sorted in a Dataframe

To check if the index of a dataframe is sorted in ascending order, we can use the index attribute and the is_monotonic attribute as shown below.

import pandas as pd
df=pd.read_csv("grade2.csv",index_col="Marks")
df.sort_index(inplace=True,ascending=True)
print("The dataframe is:")
print(df)
temp=df.index.is_monotonic
print("The Index is sorted:",temp)

Output:

The dataframe is:
       Class  Roll        Name Grade
Marks                               
50         3    11       Bobby     D
55         2    27       Harsh     C
55         3    27      Aditya     C
78         2    23       Clara     B
78         3    15    Prashant     B
78         3    23  Radheshyam     B
82         3    33        Tina     A
88         3    34         Amy     A
The Index is sorted: True

To check if the index of a dataframe is sorted in ascending order, we can use the index attribute and the is_monotonic_increasing attribute as shown below.

import pandas as pd
df=pd.read_csv("grade2.csv",index_col="Marks")
df.sort_index(inplace=True,ascending=True)
print("The dataframe is:")
print(df)
temp=df.index.is_monotonic_increasing
print("The Index is sorted:",temp)

Output:

The dataframe is:
       Class  Roll        Name Grade
Marks                               
50         3    11       Bobby     D
55         2    27       Harsh     C
55         3    27      Aditya     C
78         2    23       Clara     B
78         3    15    Prashant     B
78         3    23  Radheshyam     B
82         3    33        Tina     A
88         3    34         Amy     A
The Index is sorted: True

To check if the index of a dataframe is sorted in descending order, we can use the index attribute and the is_monotonic_decreasing attribute as shown below.

import pandas as pd
df=pd.read_csv("grade2.csv",index_col="Marks")
df.sort_index(inplace=True,ascending=False)
print("The dataframe is:")
print(df)
temp=df.index.is_monotonic_decreasing
print("The Index is sorted:",temp)

Output:

The dataframe is:
       Class  Roll        Name Grade
Marks                               
88         3    34         Amy     A
82         3    33        Tina     A
78         2    23       Clara     B
78         3    15    Prashant     B
78         3    23  Radheshyam     B
55         2    27       Harsh     C
55         3    27      Aditya     C
50         3    11       Bobby     D
The Index is sorted: True

You need to keep in mind that the is_monotonic attribute, is_monotonic_increasing attribute, and the is_monotonic_decreasing always return False if the index column contains NaN values. Therefore, you cannot use these attributes to check if the index is sorted if the index column contains NaN values. 

Conclusion

In this article, we have discussed different ways to check if a column is sorted in a pandas dataframe. For this, we have used the pandas library as well as the numpy module. We have also checked if the index of a pandas dataframe is sorted or not.

To learn more about python programming, you can read this article on dictionary comprehension in python. You might like this article on list comprehension in python too.

I hope you enjoyed reading this article. Stay tuned for more informative articles.

Happy Learning!

The post Check if a Column Is Sorted in a Pandas Dataframe appeared first on PythonForBeginners.com.


Viewing all articles
Browse latest Browse all 22880

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>