Quantcast
Channel: Planet Python
Viewing all articles
Browse latest Browse all 22875

Python for Beginners: Sort Pandas DataFrame in Python

$
0
0

Pandas dataframes are used to handle tabular data in Python. Many times, we need to sort the dataframe based on a column. In this article, we will discuss different ways to sort a pandas dataframe in Python.

The sort_values() Method

The sort_values() function is used to sort a pandas dataframe horizontally or vertically. It has the following syntax.

DataFrame.sort_values(by, *, axis=0, ascending=True, inplace=False, kind='quicksort', na_position='last', ignore_index=False, key=None)

Here,

  • The by parameter takes a string or a list of strings as its input argument. The input to the by parameter depends on whether we want to sort the rows or columns of a dataframe. To sort the rows of a dataframe based on a column, we can pass a column name or list of column names to the by parameter. To sort the columns of a dataframe based on a row, we can pass the row index or a list of row indices to the by parameter.
  • The axis parameter is used to decide if we want to sort the rows or columns of the dataframe. To sort the rows of a dataframe based on a column or list of columns, we can pass the value 0 to the axis parameter which is its default value. To sort the columns of a dataframe based on a row or multiple rows, we can pass the value 1 to the axis parameter.
  • The ascending parameter is used to decide if the dataframe is sorted in ascending or descending order. By default, it is True denoting that sorting occurs in ascending order. You can set it to False to sort the dataframe in descending order. If sorting is done by multiple columns, you can pass a list of True and False values to the ascending parameter to decide on which column the dataframe is sorted in ascending order or descending order.
  • The inplace parameter is used to decide whether we modify the original dataframe or create a new dataframe after sorting. By default, inplace is set to False. Hence, it doesn’t modify the original dataframe and the sort_values() method returns the new sorted dataframe. If you want to modify the original dataframe while sorting, you can set inplace to True.
  • The kind parameter is used to decide the sorting algorithm. By default, the sort_values() method uses the quicksort algorithm. After data analysis, if you think that the input data has a definite pattern and a certain sorting algorithm can reduce the time, you can use ‘mergesort’, ‘heapsort’, or ‘stable’ sorting algorithms.
  • The na_position parameter is used to decide the position of rows having NaN values. By default, it has the value 'last' denoting that the rows with NaN values are stored at last in the sorted dataframe. You can set it to “first” if you want to have rows with NaN values at the top of the sorted dataframe.
  • The ignore_index parameter is used to decide if the indices of the rows in the input dataframe are preserved in the sorted dataframe. By default, it is True denoting that the indices are preserved. If you want to ignore the indices of the initial dataframe, you can set ignore_index to True.
  • The key parameter is used to perform operations on the columns of the dataframe before sorting. It takes a vectorized function as its input argument. The function provided to the key parameter must take a pandas series as its input argument and return a pandas series. Before sorting, the function is applied to each column in the input dataframe independently.

After execution, the sort_values() method returns the sorted dataframe if the inplace parameter is set to False. If inplace is set to True, the sort_values() method returns None.

Sort Rows of a Dataframe by a Column in Python

To sort a dataframe by a column, we will invoke the sort_values() method on the dataframe. We will pass the column name by which the dataframe has to be sorted as the input argument to the “by” parameter. After execution, the sort_values() method will return the sorted dataframe. Following is the CSV file that we have used for creating dataframes in this article.

import pandas as pd
grades=pd.read_csv("grade.csv")
print("The input dataframe is")
print(grades)
sorted_df=grades.sort_values(by="Marks")
print("The sorted dataframe is")
print(sorted_df)

Output:

The input dataframe is
    Class  Roll        Name  Marks Grade
0       1    11      Aditya     85     A
1       1    12       Chris     95     A
2       1    14         Sam     75     B
3       1    16      Aditya     78     B
4       1    15       Harry     55     C
5       2     1        Joel     68     B
6       2    22         Tom     73     B
7       2    15        Golu     79     B
8       2    27       Harsh     55     C
9       2    23       Clara     78     B
10      3    33        Tina     82     A
11      3    34         Amy     88     A
12      3    15    Prashant     78     B
13      3    27      Aditya     55     C
14      3    23  Radheshyam     78     B
15      3    11       Bobby     50     D
The sorted dataframe is
    Class  Roll        Name  Marks Grade
15      3    11       Bobby     50     D
4       1    15       Harry     55     C
8       2    27       Harsh     55     C
13      3    27      Aditya     55     C
5       2     1        Joel     68     B
6       2    22         Tom     73     B
2       1    14         Sam     75     B
3       1    16      Aditya     78     B
9       2    23       Clara     78     B
12      3    15    Prashant     78     B
14      3    23  Radheshyam     78     B
7       2    15        Golu     79     B
10      3    33        Tina     82     A
0       1    11      Aditya     85     A
11      3    34         Amy     88     A
1       1    12       Chris     95     A

In the above example, we first read the CSV file into a dataframe using the read_csv() function. The read_csv() function takes the file name of the CSV file and returns a dataframe. After obtaining the dataframe, we sorted it by "Marks" using the sort_values() method.

Here, the sort_values() returns a new sorted dataframe. If you want to sort the original dataframe, you can use the inplace=True parameter as shown below.

import pandas as pd
grades=pd.read_csv("grade.csv")
print("The input dataframe is")
print(grades)
grades.sort_values(by="Marks",inplace=True)
print("The sorted dataframe is")
print(grades)

Output:

The input dataframe is
    Class  Roll        Name  Marks Grade
0       1    11      Aditya     85     A
1       1    12       Chris     95     A
2       1    14         Sam     75     B
3       1    16      Aditya     78     B
4       1    15       Harry     55     C
5       2     1        Joel     68     B
6       2    22         Tom     73     B
7       2    15        Golu     79     B
8       2    27       Harsh     55     C
9       2    23       Clara     78     B
10      3    33        Tina     82     A
11      3    34         Amy     88     A
12      3    15    Prashant     78     B
13      3    27      Aditya     55     C
14      3    23  Radheshyam     78     B
15      3    11       Bobby     50     D
The sorted dataframe is
    Class  Roll        Name  Marks Grade
15      3    11       Bobby     50     D
4       1    15       Harry     55     C
8       2    27       Harsh     55     C
13      3    27      Aditya     55     C
5       2     1        Joel     68     B
6       2    22         Tom     73     B
2       1    14         Sam     75     B
3       1    16      Aditya     78     B
9       2    23       Clara     78     B
12      3    15    Prashant     78     B
14      3    23  Radheshyam     78     B
7       2    15        Golu     79     B
10      3    33        Tina     82     A
0       1    11      Aditya     85     A
11      3    34         Amy     88     A
1       1    12       Chris     95     A

You can observe that the original dataframe has been sorted after setting inplace to True,

In the above examples, the indices are also shuffled along with the rows. This is not desired sometimes. To change the index of the rows by refreshing the index, you can set the ignore_index parameter to True.

import pandas as pd
grades=pd.read_csv("grade.csv")
print("The input dataframe is")
print(grades)
grades.sort_values(by="Marks",inplace=True,ignore_index=True)
print("The sorted dataframe is")
print(grades)

Output:

The input dataframe is
    Class  Roll        Name  Marks Grade
0       1    11      Aditya     85     A
1       1    12       Chris     95     A
2       1    14         Sam     75     B
3       1    16      Aditya     78     B
4       1    15       Harry     55     C
5       2     1        Joel     68     B
6       2    22         Tom     73     B
7       2    15        Golu     79     B
8       2    27       Harsh     55     C
9       2    23       Clara     78     B
10      3    33        Tina     82     A
11      3    34         Amy     88     A
12      3    15    Prashant     78     B
13      3    27      Aditya     55     C
14      3    23  Radheshyam     78     B
15      3    11       Bobby     50     D
The sorted dataframe is
    Class  Roll        Name  Marks Grade
0       3    11       Bobby     50     D
1       1    15       Harry     55     C
2       2    27       Harsh     55     C
3       3    27      Aditya     55     C
4       2     1        Joel     68     B
5       2    22         Tom     73     B
6       1    14         Sam     75     B
7       1    16      Aditya     78     B
8       2    23       Clara     78     B
9       3    15    Prashant     78     B
10      3    23  Radheshyam     78     B
11      2    15        Golu     79     B
12      3    33        Tina     82     A
13      1    11      Aditya     85     A
14      3    34         Amy     88     A
15      1    12       Chris     95     A

In the above example, you can observe that the index of the rows at each position is the same as the original dataframe and hasn’t been shuffled with the input rows. This is due to the reason that we have specified ignore_index to True.

Sort Rows of a Dataframe by Multiple Columns

Instead of sorting the dataframe by just one column, we can also sort the rows of a dataframe by multiple columns.

To sort the rows of a pandas dataframe by multiple columns, you can pass the list of column names as the input argument to the “by” parameter. When we pass a list of column names, the rows are sorted according to the first element. After that, they are sorted according to the second element of the list, and so on. You can observe this in the following example.

import pandas as pd
grades=pd.read_csv("grade.csv")
print("The input dataframe is")
print(grades)
grades.sort_values(by=["Class","Marks"],inplace=True,ignore_index=True)
print("The sorted dataframe is")
print(grades)

Output:

The input dataframe is
    Class  Roll        Name  Marks Grade
0       1    11      Aditya     85     A
1       1    12       Chris     95     A
2       1    14         Sam     75     B
3       1    16      Aditya     78     B
4       1    15       Harry     55     C
5       2     1        Joel     68     B
6       2    22         Tom     73     B
7       2    15        Golu     79     B
8       2    27       Harsh     55     C
9       2    23       Clara     78     B
10      3    33        Tina     82     A
11      3    34         Amy     88     A
12      3    15    Prashant     78     B
13      3    27      Aditya     55     C
14      3    23  Radheshyam     78     B
15      3    11       Bobby     50     D
The sorted dataframe is
    Class  Roll        Name  Marks Grade
0       1    15       Harry     55     C
1       1    14         Sam     75     B
2       1    16      Aditya     78     B
3       1    11      Aditya     85     A
4       1    12       Chris     95     A
5       2    27       Harsh     55     C
6       2     1        Joel     68     B
7       2    22         Tom     73     B
8       2    23       Clara     78     B
9       2    15        Golu     79     B
10      3    11       Bobby     50     D
11      3    27      Aditya     55     C
12      3    15    Prashant     78     B
13      3    23  Radheshyam     78     B
14      3    33        Tina     82     A
15      3    34         Amy     88     A

In the above example, we have sorted the dataframe by two columns i.e. Class and Marks. For this, we have passed the list ["Class", "Marks"] to the by parameter in the sort_values() method.

Here, the dataframe is sorted by the order of the column names in the by parameter. First, the dataframe is sorted by the "Class"column. When two or more rows have the same values in the "Class" column, the rows are then sorted by the "Marks"column.

Sort Values in Descending Order in a Pandas DataFrame

By default, the sort_values() method sorts the dataframe in ascending order. To sort the values in descending order, you can use the “ascending” parameter and set it to False. Then, the sort_values() method will sort the dataframe in descending order. You can observe this in the following example.

import pandas as pd
grades=pd.read_csv("grade.csv")
print("The input dataframe is")
print(grades)
grades.sort_values(by="Marks",inplace=True,ignore_index=True,ascending=False)
print("The sorted dataframe is")
print(grades)

Output:

The input dataframe is
    Class  Roll        Name  Marks Grade
0       1    11      Aditya     85     A
1       1    12       Chris     95     A
2       1    14         Sam     75     B
3       1    16      Aditya     78     B
4       1    15       Harry     55     C
5       2     1        Joel     68     B
6       2    22         Tom     73     B
7       2    15        Golu     79     B
8       2    27       Harsh     55     C
9       2    23       Clara     78     B
10      3    33        Tina     82     A
11      3    34         Amy     88     A
12      3    15    Prashant     78     B
13      3    27      Aditya     55     C
14      3    23  Radheshyam     78     B
15      3    11       Bobby     50     D
The sorted dataframe is
    Class  Roll        Name  Marks Grade
0       1    12       Chris     95     A
1       3    34         Amy     88     A
2       1    11      Aditya     85     A
3       3    33        Tina     82     A
4       2    15        Golu     79     B
5       1    16      Aditya     78     B
6       2    23       Clara     78     B
7       3    15    Prashant     78     B
8       3    23  Radheshyam     78     B
9       1    14         Sam     75     B
10      2    22         Tom     73     B
11      2     1        Joel     68     B
12      1    15       Harry     55     C
13      2    27       Harsh     55     C
14      3    27      Aditya     55     C
15      3    11       Bobby     50     D

In this example, we have set the ascending parameter to False. Due to this, the rows of the dataframe are sorted by Marks in descending order.

We can also sort the dataframe in descending order if we are sorting it by multiple columns as shown below.

import pandas as pd
grades=pd.read_csv("grade.csv")
print("The input dataframe is")
print(grades)
grades.sort_values(by=["Class","Marks"],inplace=True,ignore_index=True,ascending=False)
print("The sorted dataframe is")
print(grades)

Output:

The input dataframe is
    Class  Roll        Name  Marks Grade
0       1    11      Aditya     85     A
1       1    12       Chris     95     A
2       1    14         Sam     75     B
3       1    16      Aditya     78     B
4       1    15       Harry     55     C
5       2     1        Joel     68     B
6       2    22         Tom     73     B
7       2    15        Golu     79     B
8       2    27       Harsh     55     C
9       2    23       Clara     78     B
10      3    33        Tina     82     A
11      3    34         Amy     88     A
12      3    15    Prashant     78     B
13      3    27      Aditya     55     C
14      3    23  Radheshyam     78     B
15      3    11       Bobby     50     D
The sorted dataframe is
    Class  Roll        Name  Marks Grade
0       3    34         Amy     88     A
1       3    33        Tina     82     A
2       3    15    Prashant     78     B
3       3    23  Radheshyam     78     B
4       3    27      Aditya     55     C
5       3    11       Bobby     50     D
6       2    15        Golu     79     B
7       2    23       Clara     78     B
8       2    22         Tom     73     B
9       2     1        Joel     68     B
10      2    27       Harsh     55     C
11      1    12       Chris     95     A
12      1    11      Aditya     85     A
13      1    16      Aditya     78     B
14      1    14         Sam     75     B
15      1    15       Harry     55     C

In the above example, the dataframe is first sorted by the Class column in descending order. If the rows have the same values for the Class column, such rows are sorted by Marks in descending order.

While sorting a dataframe by multiple columns, you can pass a list of True and False values to the ascending parameter. This helps us sort the dataframe by one column in ascending order and by another column in descending order. For instance, consider the following example.

import pandas as pd
grades=pd.read_csv("grade.csv")
print("The input dataframe is")
print(grades)
grades.sort_values(by=["Class","Marks"],inplace=True,ignore_index=True,ascending=[True,False])
print("The sorted dataframe is")
print(grades)

Output:

The input dataframe is
    Class  Roll        Name  Marks Grade
0       1    11      Aditya     85     A
1       1    12       Chris     95     A
2       1    14         Sam     75     B
3       1    16      Aditya     78     B
4       1    15       Harry     55     C
5       2     1        Joel     68     B
6       2    22         Tom     73     B
7       2    15        Golu     79     B
8       2    27       Harsh     55     C
9       2    23       Clara     78     B
10      3    33        Tina     82     A
11      3    34         Amy     88     A
12      3    15    Prashant     78     B
13      3    27      Aditya     55     C
14      3    23  Radheshyam     78     B
15      3    11       Bobby     50     D
The sorted dataframe is
    Class  Roll        Name  Marks Grade
0       1    12       Chris     95     A
1       1    11      Aditya     85     A
2       1    16      Aditya     78     B
3       1    14         Sam     75     B
4       1    15       Harry     55     C
5       2    15        Golu     79     B
6       2    23       Clara     78     B
7       2    22         Tom     73     B
8       2     1        Joel     68     B
9       2    27       Harsh     55     C
10      3    34         Amy     88     A
11      3    33        Tina     82     A
12      3    15    Prashant     78     B
13      3    23  Radheshyam     78     B
14      3    27      Aditya     55     C
15      3    11       Bobby     50     D

In this example, we have sorted the dataframe by Class and Marks Column. In the ascending parameter, we have given the list [True, False]. Due to this, the dataframe is first sorted by the Class column in ascending order. If the rows have the same values for the Class column, such rows are sorted by Marks in descending order.

Sort Dataframe With NaN Values in Python

In python pandas, the NaN values are treated as floating point numbers. When we sort the rows of a dataframe containing NaN values using the sort_values() method, the rows with NaN values are placed at the bottom of the dataframe as shown below.

import pandas as pd
grades=pd.read_csv("grade.csv")
print("The input dataframe is")
print(grades)
grades.sort_values(by="Marks",inplace=True,ignore_index=True)
print("The sorted dataframe is")
print(grades)

Output:

he input dataframe is
    Class  Roll        Name  Marks Grade
0       1    11      Aditya   85.0     A
1       1    12       Chris   95.0     A
2       1    14         Sam   75.0     B
3       1    16      Aditya   78.0     B
4       1    15       Harry    NaN     C
5       2     1        Joel   68.0     B
6       2    22         Tom   73.0     B
7       2    15        Golu   79.0     B
8       2    27       Harsh   55.0     C
9       2    23       Clara    NaN     B
10      3    33        Tina   82.0     A
11      3    34         Amy   88.0     A
12      3    15    Prashant    NaN     B
13      3    27      Aditya   55.0     C
14      3    23  Radheshyam   78.0     B
15      3    11       Bobby   50.0     D
The sorted dataframe is
    Class  Roll        Name  Marks Grade
0       3    11       Bobby   50.0     D
1       2    27       Harsh   55.0     C
2       3    27      Aditya   55.0     C
3       2     1        Joel   68.0     B
4       2    22         Tom   73.0     B
5       1    14         Sam   75.0     B
6       1    16      Aditya   78.0     B
7       3    23  Radheshyam   78.0     B
8       2    15        Golu   79.0     B
9       3    33        Tina   82.0     A
10      1    11      Aditya   85.0     A
11      3    34         Amy   88.0     A
12      1    12       Chris   95.0     A
13      1    15       Harry    NaN     C
14      2    23       Clara    NaN     B
15      3    15    Prashant    NaN     B

In this example, you can observe that the Marks column contains some NaN values. When we sort the dataframe by the Marks column, the rows with NaN values in the Marks column are placed at the bottom of the sorted dataframe.

If you want to place the rows with NaN values at the top of the dataframe, you can set the na_position parameter to “first” in the sort_values() function as shown below.

import pandas as pd
grades=pd.read_csv("grade.csv")
print("The input dataframe is")
print(grades)
grades.sort_values(by="Marks",inplace=True,ignore_index=True,na_position="first")
print("The sorted dataframe is")
print(grades)

Output:

The input dataframe is
    Class  Roll        Name  Marks Grade
0       1    11      Aditya   85.0     A
1       1    12       Chris   95.0     A
2       1    14         Sam   75.0     B
3       1    16      Aditya   78.0     B
4       1    15       Harry    NaN     C
5       2     1        Joel   68.0     B
6       2    22         Tom   73.0     B
7       2    15        Golu   79.0     B
8       2    27       Harsh   55.0     C
9       2    23       Clara    NaN     B
10      3    33        Tina   82.0     A
11      3    34         Amy   88.0     A
12      3    15    Prashant    NaN     B
13      3    27      Aditya   55.0     C
14      3    23  Radheshyam   78.0     B
15      3    11       Bobby   50.0     D
The sorted dataframe is
    Class  Roll        Name  Marks Grade
0       1    15       Harry    NaN     C
1       2    23       Clara    NaN     B
2       3    15    Prashant    NaN     B
3       3    11       Bobby   50.0     D
4       2    27       Harsh   55.0     C
5       3    27      Aditya   55.0     C
6       2     1        Joel   68.0     B
7       2    22         Tom   73.0     B
8       1    14         Sam   75.0     B
9       1    16      Aditya   78.0     B
10      3    23  Radheshyam   78.0     B
11      2    15        Golu   79.0     B
12      3    33        Tina   82.0     A
13      1    11      Aditya   85.0     A
14      3    34         Amy   88.0     A
15      1    12       Chris   95.0     A

In the above example, we have set the parameter na_position to "top". Due to this, the rows in which the Marks column has NaN value are placed at the top of the sorted dataframe.

Sort Columns of a Dataframe By a Row in Python

We can also sort the columns of a dataframe based on the values in a row. We can use the axis parameter in the sort_values() function for this.
To sort the columns of a dataframe by a row, we will pass the index of the row as an input argument to the “by” method. Additionally, we will set the axis parameter to 1 in the sort_values() method. After execution, the sort_values() method will return a dataframe with columns sorted by the given row. You can observe this in the following example.

import pandas as pd
grades=pd.read_csv("StudentMarks.csv",index_col="Student")
print("The input dataframe is")
print(grades)
grades.sort_values(by="Aditya",axis=1,inplace=True,ignore_index=True,na_position="first")
print("The sorted dataframe is")
print(grades)

Output:

The input dataframe is
         Physics  Chemistry  Math  Biology  Arts
Student                                         
Aditya        92         76    95       73    91
Chris         95         96    79       71    93
Sam           65         62    75       95    63
Harry         68         92    69       66    98
Golu          74         95    96       76    64
Joel          99         79    77       91    61
Tom           72         94    61       65    69
Harsh         98         99    93       95    91
Clara         93         67    78       79    71
Tina          99         76    78       94    95
The sorted dataframe is
          0   1   2   3   4
Student                    
Aditya   73  76  91  92  95
Chris    71  96  93  95  79
Sam      95  62  63  65  75
Harry    66  92  98  68  69
Golu     76  95  64  74  96
Joel     91  79  61  99  77
Tom      65  94  69  72  61
Harsh    95  99  91  98  93
Clara    79  67  71  93  78
Tina     94  76  95  99  78

In the above example, we have sorted the columns of the dataframe based on the row with the index "Aditya". For this, we have set the axis parameter to 1 and passed the index name to the by parameter of the sort_values() method.

In the output dataframe above, you can observe that the column names have been removed. This is due to the reason that we have set the ignore_index parameter to True.

If you want to preserve the column names, you can either remove the ignore_index parameter or set it to False as shown below.

import pandas as pd
grades=pd.read_csv("StudentMarks.csv",index_col="Student")
print("The input dataframe is")
print(grades)
grades.sort_values(by="Aditya",axis=1,inplace=True,na_position="first")
print("The sorted dataframe is")
print(grades)

Output:

The input dataframe is
         Physics  Chemistry  Math  Biology  Arts
Student                                         
Aditya        92         76    95       73    91
Chris         95         96    79       71    93
Sam           65         62    75       95    63
Harry         68         92    69       66    98
Golu          74         95    96       76    64
Joel          99         79    77       91    61
Tom           72         94    61       65    69
Harsh         98         99    93       95    91
Clara         93         67    78       79    71
Tina          99         76    78       94    95
The sorted dataframe is
         Biology  Chemistry  Arts  Physics  Math
Student                                         
Aditya        73         76    91       92    95
Chris         71         96    93       95    79
Sam           95         62    63       65    75
Harry         66         92    98       68    69
Golu          76         95    64       74    96
Joel          91         79    61       99    77
Tom           65         94    69       72    61
Harsh         95         99    91       98    93
Clara         79         67    71       93    78
Tina          94         76    95       99    78

In this example, you can observe that we have retained the column names of the dataframe. This is due to the reason that we have removed the ignore_index parameter and it has been set to False which is its default value.

Suggested Reading: If you are into machine learning, you can read this article on regression in machine learning. You might also like this article on clustering mixed data types in Python.

The sort_index() Method

The sort_index() method is used to sort a pandas dataframe by indices. It has the following syntax.

DataFrame.sort_index(*, axis=0, level=None, ascending=True, inplace=False, kind='quicksort', na_position='last', sort_remaining=True, ignore_index=False, key=None)
  • The axis parameter is used to decide if we want to sort the rows or columns of the dataframe. To sort the rows of a dataframe based on a column or list of columns, we can pass the value 0 to the axis parameter which is its default value. To sort the columns of a dataframe based on a row or multiple rows, we can pass the value 1 to the axis parameter.
  • The level parameter is used to decide the index level by which the dataframe is to be sorted. It has the default value None denoting that sorting happens by all the index levels. If you want to sort the dataframe by a specific index level, you can pass the index level or index name to the level parameter. To sort the dataframe by multiple indices, you can give a list of index names or index levels to the level parameter.
  • The ascending parameter determines if the dataframe is sorted in ascending or descending order. By default, it is True denoting that sorting occurs in ascending order. You can set it to False to sort the dataframe in descending order. For dataframes having multilevel indices, you can pass a list of True and False values to decide on which level you want in ascending order and which level you want in descending order.
  • The inplace parameter is used to decide whether we modify the original dataframe or create a new dataframe after sorting. By default, inplace is set to False. Hence, the sort_index() method doesn’t modify the original dataframe and returns the new sorted dataframe. If you want to modify the original dataframe while sorting, you can set inplace to True.
  • The kind parameter is used to decide the sorting algorithm. By default, the sort_values() method uses the quicksort algorithm. After data analysis, if you think the input data has a definite pattern and a certain sorting algorithm can reduce the time, you can use ‘mergesort’, ‘heapsort’, or‘stable’ sorting algorithms.
  • The na_position parameter is used to decide the position of rows having NaN values. By default, it has the value 'last' denoting that the rows with NaN values are stored at last in the sorted dataframe. You can set it to “first” if you want to have rows with NaN values at the top of the sorted dataframe.
  • The sort_remaining parameter is used for dataframes having multilevel indices. If you want to sort the dataframe by levels that are not specified in the level parameter, you can set the sort_remaining parameter to True. If you don’t want to sort the dataframe by the remaining indices, you can set sort_remaining to False.
  • The key parameter is used to perform operations on the index of the dataframe before sorting. It takes a vectorized function as its input argument. The function provided to the key parameter must take an Index object as its input argument and return an Index object after execution. Before sorting, the function is applied to each index column in the input dataframe independently.

After execution, the sort_index() method returns the sorted dataframe if the inplace parameter is set to False. If inplace is set to True, the sort_index() method returns None.

Sort Pandas Dataframe by Index

To sort a pandas dataframe by index, you can use the sort_index() method on the dataframe. For this, we first need to create a dataframe with an index. Then, we can invoke the sort_index() method on the dataframe. After execution, the sort_index() method returns a sorted dataframe. You can observe this in the following example.

import pandas as pd
grades=pd.read_csv("grade.csv",index_col="Roll")
print("The input dataframe is")
print(grades)
sorted_df=grades.sort_index()
print("The sorted dataframe is")
print(sorted_df)

Output:

The input dataframe is
      Class        Name  Marks Grade
Roll                                
11        1      Aditya   85.0     A
12        1       Chris   95.0     A
14        1         Sam   75.0     B
16        1      Aditya   78.0     B
15        1       Harry    NaN     C
1         2        Joel   68.0     B
22        2         Tom   73.0     B
15        2        Golu   79.0     B
27        2       Harsh   55.0     C
23        2       Clara    NaN     B
33        3        Tina   82.0     A
34        3         Amy   88.0     A
15        3    Prashant    NaN     B
27        3      Aditya   55.0     C
23        3  Radheshyam   78.0     B
11        3       Bobby   50.0     D
The sorted dataframe is
      Class        Name  Marks Grade
Roll                                
1         2        Joel   68.0     B
11        1      Aditya   85.0     A
11        3       Bobby   50.0     D
12        1       Chris   95.0     A
14        1         Sam   75.0     B
15        1       Harry    NaN     C
15        2        Golu   79.0     B
15        3    Prashant    NaN     B
16        1      Aditya   78.0     B
22        2         Tom   73.0     B
23        2       Clara    NaN     B
23        3  Radheshyam   78.0     B
27        2       Harsh   55.0     C
27        3      Aditya   55.0     C
33        3        Tina   82.0     A
34        3         Amy   88.0     A

In the above example, we first read a CSV file using the read_csv() method. In the read_csv() method, we have used the index_col parameter to specify that the "Roll"column should be used as the index of the dataframe. When we invoke the sort_index() method on the dataframe returned by the read_csv() method, it returns a dataframe sorted by the index column.

In the above example, the original dataframe isn’t modified. If you want to modify the original dataframe, you can use the inplace=True parameter in the sort_index() method. After execution, the original dataframe will be modified. You can observe this in the following example.

import pandas as pd
grades=pd.read_csv("grade.csv",index_col="Roll")
print("The input dataframe is")
print(grades)
grades.sort_index(inplace=True)
print("The sorted dataframe is")
print(grades)

Output:

The input dataframe is
      Class        Name  Marks Grade
Roll                                
11        1      Aditya   85.0     A
12        1       Chris   95.0     A
14        1         Sam   75.0     B
16        1      Aditya   78.0     B
15        1       Harry    NaN     C
1         2        Joel   68.0     B
22        2         Tom   73.0     B
15        2        Golu   79.0     B
27        2       Harsh   55.0     C
23        2       Clara    NaN     B
33        3        Tina   82.0     A
34        3         Amy   88.0     A
15        3    Prashant    NaN     B
27        3      Aditya   55.0     C
23        3  Radheshyam   78.0     B
11        3       Bobby   50.0     D
The sorted dataframe is
      Class        Name  Marks Grade
Roll                                
1         2        Joel   68.0     B
11        1      Aditya   85.0     A
11        3       Bobby   50.0     D
12        1       Chris   95.0     A
14        1         Sam   75.0     B
15        1       Harry    NaN     C
15        2        Golu   79.0     B
15        3    Prashant    NaN     B
16        1      Aditya   78.0     B
22        2         Tom   73.0     B
23        2       Clara    NaN     B
23        3  Radheshyam   78.0     B
27        2       Harsh   55.0     C
27        3      Aditya   55.0     C
33        3        Tina   82.0     A
34        3         Amy   88.0     A

In this example, you can observe that the original dataframe has been sorted. This is due to the reason that we have set the inplace parameter to True in the sort_index() method.

If you have multilevel indices in your dataframe and you want to sort the dataframe by a specific index, you can pass the index level to the level parameter in the sort_index() method.

In the following example, both the Class and Roll columns have been used as indexes. The Class column is used as the primary index while the Roll column is used as the secondary index. To sort the dataframe only by the Roll column, we will use the level parameter and set it to 1. In this way, the input dataframe will be sorted by the Roll column.

import pandas as pd
grades=pd.read_csv("grade.csv",index_col=["Class","Roll"])
print("The input dataframe is")
print(grades)
grades.sort_index(level=1,inplace=True)
print("The sorted dataframe is")
print(grades)

Output:

The input dataframe is
                  Name  Marks Grade
Class Roll                         
1     11        Aditya   85.0     A
      12         Chris   95.0     A
      14           Sam   75.0     B
      16        Aditya   78.0     B
      15         Harry    NaN     C
2     1           Joel   68.0     B
      22           Tom   73.0     B
      15          Golu   79.0     B
      27         Harsh   55.0     C
      23         Clara    NaN     B
3     33          Tina   82.0     A
      34           Amy   88.0     A
      15      Prashant    NaN     B
      27        Aditya   55.0     C
      23    Radheshyam   78.0     B
      11         Bobby   50.0     D
The sorted dataframe is
                  Name  Marks Grade
Class Roll                         
2     1           Joel   68.0     B
1     11        Aditya   85.0     A
3     11         Bobby   50.0     D
1     12         Chris   95.0     A
      14           Sam   75.0     B
      15         Harry    NaN     C
2     15          Golu   79.0     B
3     15      Prashant    NaN     B
1     16        Aditya   78.0     B
2     22           Tom   73.0     B
      23         Clara    NaN     B
3     23    Radheshyam   78.0     B
2     27         Harsh   55.0     C
3     27        Aditya   55.0     C
      33          Tina   82.0     A
      34           Amy   88.0     A

In the above example, we have used the index level as an input argument to the level parameter. Alternatively, you can also pass the name of the index level to the level parameter as shown below.

import pandas as pd
grades=pd.read_csv("grade.csv",index_col=["Class","Roll"])
print("The input dataframe is")
print(grades)
grades.sort_index(level="Roll",inplace=True)
print("The sorted dataframe is")
print(grades)

Output:

The input dataframe is
                  Name  Marks Grade
Class Roll                         
1     11        Aditya   85.0     A
      12         Chris   95.0     A
      14           Sam   75.0     B
      16        Aditya   78.0     B
      15         Harry    NaN     C
2     1           Joel   68.0     B
      22           Tom   73.0     B
      15          Golu   79.0     B
      27         Harsh   55.0     C
      23         Clara    NaN     B
3     33          Tina   82.0     A
      34           Amy   88.0     A
      15      Prashant    NaN     B
      27        Aditya   55.0     C
      23    Radheshyam   78.0     B
      11         Bobby   50.0     D
The sorted dataframe is
                  Name  Marks Grade
Class Roll                         
2     1           Joel   68.0     B
1     11        Aditya   85.0     A
3     11         Bobby   50.0     D
1     12         Chris   95.0     A
      14           Sam   75.0     B
      15         Harry    NaN     C
2     15          Golu   79.0     B
3     15      Prashant    NaN     B
1     16        Aditya   78.0     B
2     22           Tom   73.0     B
      23         Clara    NaN     B
3     23    Radheshyam   78.0     B
2     27         Harsh   55.0     C
3     27        Aditya   55.0     C
      33          Tina   82.0     A
      34           Amy   88.0     A

In the above example, we use the parameter level="Roll" instead of level=1 to sort the input dataframe. In both cases, the output will be the same.

After sorting by the specified index, the sort_index() method also sorts the dataframe by the remaining indices. To stop that, you can set the sort_remaining parameter to False as shown in the following example.

import pandas as pd
grades=pd.read_csv("grade.csv",index_col=["Class","Roll"])
print("The input dataframe is")
print(grades)
grades.sort_index(level="Roll",inplace=True,sort_remaining=False)
print("The sorted dataframe is")
print(grades)

Output:

The input dataframe is
                  Name  Marks Grade
Class Roll                         
1     11        Aditya   85.0     A
      12         Chris   95.0     A
      14           Sam   75.0     B
      16        Aditya   78.0     B
      15         Harry    NaN     C
2     1           Joel   68.0     B
      22           Tom   73.0     B
      15          Golu   79.0     B
      27         Harsh   55.0     C
      23         Clara    NaN     B
3     33          Tina   82.0     A
      34           Amy   88.0     A
      15      Prashant    NaN     B
      27        Aditya   55.0     C
      23    Radheshyam   78.0     B
      11         Bobby   50.0     D
The sorted dataframe is
                  Name  Marks Grade
Class Roll                         
2     1           Joel   68.0     B
1     11        Aditya   85.0     A
3     11         Bobby   50.0     D
1     12         Chris   95.0     A
      14           Sam   75.0     B
      15         Harry    NaN     C
2     15          Golu   79.0     B
3     15      Prashant    NaN     B
1     16        Aditya   78.0     B
2     22           Tom   73.0     B
      23         Clara    NaN     B
3     23    Radheshyam   78.0     B
2     27         Harsh   55.0     C
3     27        Aditya   55.0     C
      33          Tina   82.0     A
      34           Amy   88.0     A

In the above example, if two rows have the same value in the "Roll" column and the sort_remaining parameter is not set to False, the sort_index() method will sort the dataframe according to the Class index. To stop the sort_index() method from doing so, we have used the sort_remaining parameter and set it to False.

Sort Pandas Dataframe by Multiple Indices in Python

To sort a pandas dataframe by multiple indices, you can pass the list of index levels to the level parameter of the sort_index() method as shown below.

import pandas as pd
grades=pd.read_csv("grade.csv",index_col=["Class","Roll"])
print("The input dataframe is")
print(grades)
grades.sort_index(level=[0,1],inplace=True)
print("The sorted dataframe is")
print(grades)

Output:

The input dataframe is
                  Name  Marks Grade
Class Roll                         
1     11        Aditya   85.0     A
      12         Chris   95.0     A
      14           Sam   75.0     B
      16        Aditya   78.0     B
      15         Harry    NaN     C
2     1           Joel   68.0     B
      22           Tom   73.0     B
      15          Golu   79.0     B
      27         Harsh   55.0     C
      23         Clara    NaN     B
3     33          Tina   82.0     A
      34           Amy   88.0     A
      15      Prashant    NaN     B
      27        Aditya   55.0     C
      23    Radheshyam   78.0     B
      11         Bobby   50.0     D
The sorted dataframe is
                  Name  Marks Grade
Class Roll                         
1     11        Aditya   85.0     A
      12         Chris   95.0     A
      14           Sam   75.0     B
      15         Harry    NaN     C
      16        Aditya   78.0     B
2     1           Joel   68.0     B
      15          Golu   79.0     B
      22           Tom   73.0     B
      23         Clara    NaN     B
      27         Harsh   55.0     C
3     11         Bobby   50.0     D
      15      Prashant    NaN     B
      23    Radheshyam   78.0     B
      27        Aditya   55.0     C
      33          Tina   82.0     A
      34           Amy   88.0     A

In the above example, we have Class and Roll columns as indices. When we pass level=[0,1], the sort_index() method first sorts the input dataframe by the Class column. If two rows have the same value for the Class column, it sorts them according to the Roll column.

Instead of index levels, you can also pass the name of index levels to the level parameter as shown in the following example.

import pandas as pd
grades=pd.read_csv("grade.csv",index_col=["Class","Roll"])
print("The input dataframe is")
print(grades)
grades.sort_index(level=["Class","Roll"],inplace=True)
print("The sorted dataframe is")
print(grades)

Output:

The input dataframe is
                  Name  Marks Grade
Class Roll                         
1     11        Aditya   85.0     A
      12         Chris   95.0     A
      14           Sam   75.0     B
      16        Aditya   78.0     B
      15         Harry    NaN     C
2     1           Joel   68.0     B
      22           Tom   73.0     B
      15          Golu   79.0     B
      27         Harsh   55.0     C
      23         Clara    NaN     B
3     33          Tina   82.0     A
      34           Amy   88.0     A
      15      Prashant    NaN     B
      27        Aditya   55.0     C
      23    Radheshyam   78.0     B
      11         Bobby   50.0     D
The sorted dataframe is
                  Name  Marks Grade
Class Roll                         
1     11        Aditya   85.0     A
      12         Chris   95.0     A
      14           Sam   75.0     B
      15         Harry    NaN     C
      16        Aditya   78.0     B
2     1           Joel   68.0     B
      15          Golu   79.0     B
      22           Tom   73.0     B
      23         Clara    NaN     B
      27         Harsh   55.0     C
3     11         Bobby   50.0     D
      15      Prashant    NaN     B
      23    Radheshyam   78.0     B
      27        Aditya   55.0     C
      33          Tina   82.0     A
      34           Amy   88.0     A

In the above example, we use the parameter level=["Class", "Roll"] instead of level=[0, 1] to sort the input dataframe. In both cases, the output will be the same.

Sort Pandas Dataframe by Index in Descending Order

To sort a dataframe by index in descending order, you can set the ascending parameter in the sort_index() method to False as shown below.

import pandas as pd
grades=pd.read_csv("grade.csv",index_col=["Class","Roll"])
print("The input dataframe is")
print(grades)
grades.sort_index(level="Roll",inplace=True,ascending=False)
print("The sorted dataframe is")
print(grades)

Output:

The input dataframe is
                  Name  Marks Grade
Class Roll                         
1     11        Aditya   85.0     A
      12         Chris   95.0     A
      14           Sam   75.0     B
      16        Aditya   78.0     B
      15         Harry    NaN     C
2     1           Joel   68.0     B
      22           Tom   73.0     B
      15          Golu   79.0     B
      27         Harsh   55.0     C
      23         Clara    NaN     B
3     33          Tina   82.0     A
      34           Amy   88.0     A
      15      Prashant    NaN     B
      27        Aditya   55.0     C
      23    Radheshyam   78.0     B
      11         Bobby   50.0     D
The sorted dataframe is
                  Name  Marks Grade
Class Roll                         
3     34           Amy   88.0     A
      33          Tina   82.0     A
      27        Aditya   55.0     C
2     27         Harsh   55.0     C
3     23    Radheshyam   78.0     B
2     23         Clara    NaN     B
      22           Tom   73.0     B
1     16        Aditya   78.0     B
3     15      Prashant    NaN     B
2     15          Golu   79.0     B
1     15         Harry    NaN     C
      14           Sam   75.0     B
      12         Chris   95.0     A
3     11         Bobby   50.0     D
1     11        Aditya   85.0     A
2     1           Joel   68.0     B

In the above example, the sort_index() method sorts the input dataframe by Class and Roll column in descending order.

While sorting a dataframe by multiple indices, you can pass a list of True and False values to the ascending parameter as shown below.

import pandas as pd
grades=pd.read_csv("grade.csv",index_col=["Class","Roll"])
print("The input dataframe is")
print(grades)
grades.sort_index(level=["Class","Roll"],inplace=True,ascending=[False,True])
print("The sorted dataframe is")
print(grades)

Output:

The input dataframe is
                  Name  Marks Grade
Class Roll                         
1     11        Aditya   85.0     A
      12         Chris   95.0     A
      14           Sam   75.0     B
      16        Aditya   78.0     B
      15         Harry    NaN     C
2     1           Joel   68.0     B
      22           Tom   73.0     B
      15          Golu   79.0     B
      27         Harsh   55.0     C
      23         Clara    NaN     B
3     33          Tina   82.0     A
      34           Amy   88.0     A
      15      Prashant    NaN     B
      27        Aditya   55.0     C
      23    Radheshyam   78.0     B
      11         Bobby   50.0     D
The sorted dataframe is
                  Name  Marks Grade
Class Roll                         
3     11         Bobby   50.0     D
      15      Prashant    NaN     B
      23    Radheshyam   78.0     B
      27        Aditya   55.0     C
      33          Tina   82.0     A
      34           Amy   88.0     A
2     1           Joel   68.0     B
      15          Golu   79.0     B
      22           Tom   73.0     B
      23         Clara    NaN     B
      27         Harsh   55.0     C
1     11        Aditya   85.0     A
      12         Chris   95.0     A
      14           Sam   75.0     B
      15         Harry    NaN     C
      16        Aditya   78.0     B

In this example, we have sorted the dataframe by Class and Marks Column. In the ascending parameter, we have given the list [False, True]. Due to this, the dataframe is first sorted by the Class column in descending order. If the rows have the same values for the Class column, such rows are sorted by Marks in ascending order.

Conclusion

In this article, we have discussed different ways to sort a pandas dataframe in Python using the sort_values() and the sort_index() methods.

To learn more about python programming, you can read this article on dictionary comprehension in python. You might like this article on list comprehension in python too.

The post Sort Pandas DataFrame in Python appeared first on PythonForBeginners.com.


Viewing all articles
Browse latest Browse all 22875

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>