Missing Values
Missing values are everywhere, and you don’t want them interfering with your work.
Finding missing values
- To detect missing values:
df.isna()
- To check each column for missing values:
df.isna().any()
- To count missing values:
df.isna().sum()
- To plot missing values:
import matplotlib.pyplot as plt
df.isna().sum().plot(kind="bar")
plt.show()
Example 1
- Print a DataFrame that shows whether each value is missing or not.
- Print a summary that shows whether any value in each column is missing or not.
- Create a bar plot of the total number of missing values in each column.
# Import matplotlib.pyplot with alias plt
import matplotlib.pyplot as plt
# Check individual values for missing values
print(avocados_2016.isna())
# Check each column for missing values
print(avocados_2016.isna().any())
# Bar plot of missing values by variable
avocados_2016.isna().sum().plot(kind=
"bar")
# Show plot
plt.show()
Removing missing values
One way to deal with missing values is to remove them from the dataset completely.
- To remove missing values, we use
.dropna():
df.dropna()
Example 2
- Remove the rows containing missing values
- Verify that all missing values have been removed. Calculate each columns has any NAs, and print.
# Remove rows with missing values
avocados_complete = avocados_2016.dropna()
# Check if any columns contain missing values
print(avocados_complete.isna().any())
Replacing missing values
Another way of handling missing values is to replace them all with the same value. For numerical variables, one option is to replace values with 0
- To replace missing values
df.fillna(0)