Dev Handbook > Python > Pandas > Sorting and Selecting Data

Sorting and Selecting Data

I. Sorting

You can sort the rows by passing a column name to .sort_values().

df.sort_values(column)

For example

students.sort_values("grade")

To sort in descending order, we add ascending=False

In this example, the grade will be the highest to the lowest:

students.sort_values("grade", ascending=False)

We can sort by multiple variables by passing a list of column names to sort_values()

students.sort_values("grade", "age")

To change the direction values are sorted in, pass a list to the ascending argument to specific which direction sorting.

students.sort_values("grade", "age", ascending=[True, False])

Columns can be used in calculation and plotting data.

We select with DataFrame-name['column']

name = records['name']
print(name)

If a column string only contains letters, numbers and underscores, we can use dot notation.

DataFrame-name.column-name

For example, with the DataFrame called students, we can select the name column with students.name.

Note in column selection

Use brackets and string for column names with spaces or special characters (-, ?, etc.)

report['Is day off?']

When using brackets and string, don’t forget the quotation mark around the column name

report['name']

To select two or more columns from a DataFrame, we use a list of the column names. To create the DataFrame shown above, we would use:

new_df = table[['column1', 'column2']]

For example

new_df = students[['last_name', 'email']]

DataFrames are zero-indexed, meaning that we start with the 0th row.

For example, to select 3rd row of students table, we use students.iloc[2]

We can also select multiple rows

students.iloc[2:5] selects all rows starting at the 2nd row and up to but not including the 5th row
students.iloc[:4] selects the first 4 rows (i.e., the 0th, 1st, 2nd, and 3rd rows)
students.iloc[-3:] selects the last 3 rows.

We can select rows when the statement is true.

df[df.MyColumnName == statement]

Recall that we use the following operators:

== tests that two values are equal.
!= tests that two values are not equal.
> and < test that greater than or less than, respectively.
>= and <= test greater than or equal to or less than or equal to, respectively.

students[students["grade"] > 60]

To select rows from multiple categories, we use | operator

For example, selects the row contains the data from March and April.

march_april = df[(df.month == 'March') | (df.month == 'April')]

We can filter multiple values of a categorical variable, the easiest way is to use the isin method.

For example

We can use the isin command to create the variable january_february_march, containing the data from January, February, and March.

january_february_march = df[df.month.isin(['January', 'February', 'March'])]

print(january_february_march)