You can sort the rows by passing a column name to .sort_values()
.
df.sort_values(column)
For example
students.sort_values("grade")
To sort in descending order, we add ascending=False
In this example, the grade will be the highest to the lowest:
students.sort_values("grade", ascending=False)
We can sort by multiple variables by passing a list of column names to sort_values()
students.sort_values("grade", "age")
To change the direction values are sorted in, pass a list to the ascending argument to specific which direction sorting.
students.sort_values("grade", "age", ascending=[True, False])
Columns can be used in calculation and plotting data.
We select with DataFrame-name['column']
name = records['name']
print(name)
If a column string only contains letters, numbers and underscores, we can use dot notation.
DataFrame-name.column-name
For example, with the DataFrame called students, we can select the name column with students.name
.
Note in column selection
report['Is day off?']
report['name']
To select two or more columns from a DataFrame, we use a list of the column names. To create the DataFrame shown above, we would use:
new_df = table[['column1', 'column2']]
For example
new_df = students[['last_name', 'email']]
DataFrames are zero-indexed, meaning that we start with the 0th row.
For example, to select 3rd row of students table, we use students.iloc[2]
We can also select multiple rows
students.iloc[2:5]
selects all rows starting at the 2nd row and up to but not including the 5th rowstudents.iloc[:4]
selects the first 4 rows (i.e., the 0th, 1st, 2nd, and 3rd rows)students.iloc[-3:]
selects the last 3 rows.We can select rows when the statement is true.
df[df.MyColumnName == statement]
Recall that we use the following operators:
==
tests that two values are equal.!=
tests that two values are not equal.>
and <
test that greater than or less than, respectively.>=
and <=
test greater than or equal to or less than or equal to, respectively.students[students["grade"] > 60]
To select rows from multiple categories, we use | operator
For example, selects the row contains the data from March and April.
march_april = df[(df.month == 'March') | (df.month == 'April')]
We can filter multiple values of a categorical variable, the easiest way is to use the isin method.
For example
We can use the isin
command to create the variable january_february_march, containing the data from January, February, and March.
january_february_march = df[df.month.isin(['January', 'February', 'March'])]
print(january_february_march)