Some notes here -
- If you use only two columns,
apply call of more than 4 columns is wasteful - The
apply call is wasteful in general because it is slow and does not offer you any advantages in the field of vectorization. - The application uses scalars, so you do not use the
.str accessory, like the pd.Series object. title.contains will be enough. Or more pythonic, "lip" in title . gender.isnull completely false, gender is a scalar, it does not have an isnull attribute
Option 1
np.where
m = df.gender.isnull() & df.title.str.contains('lip') df['gender'] = np.where(m, 'women', df.gender) df category gender sub-category title 0 health&beauty women makeup lipbalm 1 health&beauty women makeup lipstick 2 NaN women NaN lipgloss
It is not only fast, but also easier. If you are worried about case sensitivity, you can make your case contains case insensitive -
m = df.gender.isnull() & df.title.str.contains('lip', flags=re.IGNORECASE)
Option 2
Another alternative is to use pd.Series.mask / pd.Series.where -
df['gender'] = df.gender.mask(m, 'women')
Or,
df['gender'] = df.gender.where(~m, 'women')
df category gender sub-category title 0 health&beauty women makeup lipbalm 1 health&beauty women makeup lipstick 2 NaN women NaN lipgloss
mask implicitly applies the new value to the column based on the provided mask.
source share