Sorting and Selecting Data
I. Sorting
1. Sorting by one variable
You can sort the rows by passing a column name to .sort_values().
df.sort_values(column)
For example
students.sort_values("grade")
To sort in descending order, we add ascending=False
In this example, the grade will be the highest to the lowest:
students.sort_values("grade", ascending=False)
2. Sorting by multiple variables
We can sort by multiple variables by passing a list of column names to sort_values()
students.sort_values("grade", "age")
To change the direction values are sorted in, pass a list to the ascending argument to specific which direction sorting.
students.sort_values("grade", "age", ascending=[True, False])
II. Select columns
Columns can be used in calculation and plotting data.
1. Selecting with brackets and strings
We select with DataFrame-name['column']
name = records['name']
print(name)
2. Selecting with a dot
If a column string only contains letters, numbers and underscores, we can use dot notation.
DataFrame-name.column-name
For example, with the DataFrame called students, we can select the name column with students.name.
Note in column selection
- Use brackets and string for column names with spaces or special characters (-, ?, etc.)
report['Is day off?']
- When using brackets and string, don’t forget the quotation mark around the column name
report['name']
- Use square brackets, not parentheses
3. Selecting multiple columns
To select two or more columns from a DataFrame, we use a list of the column names. To create the DataFrame shown above, we would use:
new_df = table[['column1', 'column2']]
For example
new_df = students[['last_name', 'email']]
III. Select rows
DataFrames are zero-indexed, meaning that we start with the 0th row.
1. Using iloc[]
For example, to select 3rd row of students table, we use students.iloc[2]
We can also select multiple rows
students.iloc[2:5]selects all rows starting at the 2nd row and up to but not including the 5th rowstudents.iloc[:4]selects the first 4 rows (i.e., the 0th, 1st, 2nd, and 3rd rows)students.iloc[-3:]selects the last 3 rows.
2. Selecting rows with logic
We can select rows when the statement is true.
df[df.MyColumnName == statement]
Recall that we use the following operators:
==tests that two values are equal.!=tests that two values are not equal.>and<test that greater than or less than, respectively.>=and<=test greater than or equal to or less than or equal to, respectively.
students[students["grade"] > 60]
3. Selecting rows by categorical variables
To select rows from multiple categories, we use | operator
For example, selects the row contains the data from March and April.
march_april = df[(df.month == 'March') | (df.month == 'April')]
4. Selecting rows using isin
We can filter multiple values of a categorical variable, the easiest way is to use the isin method.
For example
We can use the isin command to create the variable january_february_march, containing the data from January, February, and March.
january_february_march = df[df.month.isin(['January', 'February', 'March'])]
print(january_february_march)