We can use lambda functions to perform complex operations on columns.
Example 1
We can create a new column last_name from this table
name | ||
---|---|---|
0 | Jane Doe | jdoe@gmail.com |
1 | John Smith | jsmith@gmail.com |
2 | Lara Lane | laral@gmail.com |
get_last_name = lambda x: x.split()[-1]
df['last_name'] = df.name.apply(get_last_name)
name | last_name | ||
---|---|---|---|
0 | Jane Doe | jdoe@gmail.com | Doe |
1 | John Smith | jsmith@gmail.com | Smith |
2 | Lara Lane | laral@gmail.com | Lane |
Example 2
We calculate the 25th percent for shoe price for each shoe_color:
cheap_shoes = orders.groupby('shoe_color').price.apply(lambda x: np.percentile(x, 25)).reset_index()
print(cheap_shoes)
shoe_color | price | |
---|---|---|
0 | black | 130.0 |
1 | brown | 248.0 |
2 | navy | 200.0 |
3 | red | 157.0 |
4 | white | 188.0 |
To access particular values of the row, we use the syntax row.column_name
or row[‘column_name’]
.
If we use apply without specifying a single column and add the argument axis=1
, the input to our lambda function will be an entire row, not a column
df['Price with Tax'] = df.apply(lambda row:
row['Price'] * 1.075
if row['Is taxed?'] == 'Yes'
else row['Price'],
axis=1
)