Why am I getting an AttributeError when using pandas apply?

Question

Why am I getting an AttributeError when using pandas apply?

How do I convert a NaN value to a categorical value based on a condition. I get an error when trying to convert a Nan value.

category gender sub-category title health&beauty NaN makeup lipbalm health&beauty women makeup lipstick NaN NaN NaN lipgloss

My DataFrame looks like this. And my function of converting NaN values to a gender with a categorical value looks like

 def impute_gender(cols): category=cols[0] sub_category=cols[2] gender=cols[1] title=cols[3] if title.str.contains('Lip') and gender.isnull==True: return 'women' df[['category','gender','sub_category','title']].apply(impute_gender,axis=1)

If I run the code, I get an error

 ----> 7 if title.str.contains('Lip') and gender.isnull()==True: 8 print(gender) 9 AttributeError: ("'str' object has no attribute 'str'", 'occurred at index category')

Full dataset - https://github.com/lakshmipriya04/py-sample

+5

python pandas dataframe apply attributeerror

Lpr Jan 01 '18 at 18:20

source share

3 answers

Or just use loc as option 3 for @COLDSPEED answer

 cond = (df['gender'].isnull()) & (df['title'].str.contains('lip')) df.loc[cond, 'gender'] = 'women' category gender sub-category title 0 health&beauty women makeup lipbalm 1 health&beauty women makeup lipstick 2 NaN women NaN lipgloss

+6

Vaishali Jan 01 '18 at 18:30

source share

If we are related to NaN values, fillna can be one of the methods fillna

 df.gender=df.gender.fillna(df.title.str.contains('lip').replace(True,'women')) df Out[63]: category gender sub-category title 0 health&beauty women makeup lipbalm 1 health&beauty women makeup lipstick 2 NaN women NaN lipgloss

+3

Wb Jan 01 '18 at 19:16

source share

coldspeed · Accepted Answer · 2018-01-01T18:24:47+0000

Some notes here -

If you use only two columns, apply call of more than 4 columns is wasteful
The apply call is wasteful in general because it is slow and does not offer you any advantages in the field of vectorization.
The application uses scalars, so you do not use the .str accessory, like the pd.Series object. title.contains will be enough. Or more pythonic, "lip" in title .
gender.isnull completely false, gender is a scalar, it does not have an isnull attribute

Option 1
np.where

 m = df.gender.isnull() & df.title.str.contains('lip') df['gender'] = np.where(m, 'women', df.gender) df category gender sub-category title 0 health&beauty women makeup lipbalm 1 health&beauty women makeup lipstick 2 NaN women NaN lipgloss

It is not only fast, but also easier. If you are worried about case sensitivity, you can make your case contains case insensitive -

 m = df.gender.isnull() & df.title.str.contains('lip', flags=re.IGNORECASE)

Option 2
Another alternative is to use pd.Series.mask / pd.Series.where -

 df['gender'] = df.gender.mask(m, 'women')

Or,

 df['gender'] = df.gender.where(~m, 'women')

 df category gender sub-category title 0 health&beauty women makeup lipbalm 1 health&beauty women makeup lipstick 2 NaN women NaN lipgloss

mask implicitly applies the new value to the column based on the provided mask.

Why am I getting an AttributeError when using pandas apply?

More articles: