Why am I getting an AttributeError when using pandas apply?

How do I convert a NaN value to a categorical value based on a condition. I get an error when trying to convert a Nan value.

category gender sub-category title health&beauty NaN makeup lipbalm health&beauty women makeup lipstick NaN NaN NaN lipgloss 

My DataFrame looks like this. And my function of converting NaN values ​​to a gender with a categorical value looks like

 def impute_gender(cols): category=cols[0] sub_category=cols[2] gender=cols[1] title=cols[3] if title.str.contains('Lip') and gender.isnull==True: return 'women' df[['category','gender','sub_category','title']].apply(impute_gender,axis=1) 

If I run the code, I get an error

 ----> 7 if title.str.contains('Lip') and gender.isnull()==True: 8 print(gender) 9 AttributeError: ("'str' object has no attribute 'str'", 'occurred at index category') 

Full dataset - https://github.com/lakshmipriya04/py-sample

+5
source share
3 answers

Some notes here -

  • If you use only two columns, apply call of more than 4 columns is wasteful
  • The apply call is wasteful in general because it is slow and does not offer you any advantages in the field of vectorization.
  • The application uses scalars, so you do not use the .str accessory, like the pd.Series object. title.contains will be enough. Or more pythonic, "lip" in title .
  • gender.isnull completely false, gender is a scalar, it does not have an isnull attribute

Option 1
np.where

 m = df.gender.isnull() & df.title.str.contains('lip') df['gender'] = np.where(m, 'women', df.gender) df category gender sub-category title 0 health&beauty women makeup lipbalm 1 health&beauty women makeup lipstick 2 NaN women NaN lipgloss 

It is not only fast, but also easier. If you are worried about case sensitivity, you can make your case contains case insensitive -

 m = df.gender.isnull() & df.title.str.contains('lip', flags=re.IGNORECASE) 

Option 2
Another alternative is to use pd.Series.mask / pd.Series.where -

 df['gender'] = df.gender.mask(m, 'women') 

Or,

 df['gender'] = df.gender.where(~m, 'women') 

 df category gender sub-category title 0 health&beauty women makeup lipbalm 1 health&beauty women makeup lipstick 2 NaN women NaN lipgloss 

mask implicitly applies the new value to the column based on the provided mask.

+11
source

Or just use loc as option 3 for @COLDSPEED answer

 cond = (df['gender'].isnull()) & (df['title'].str.contains('lip')) df.loc[cond, 'gender'] = 'women' category gender sub-category title 0 health&beauty women makeup lipbalm 1 health&beauty women makeup lipstick 2 NaN women NaN lipgloss 
+6
source

If we are related to NaN values, fillna can be one of the methods fillna

 df.gender=df.gender.fillna(df.title.str.contains('lip').replace(True,'women')) df Out[63]: category gender sub-category title 0 health&beauty women makeup lipbalm 1 health&beauty women makeup lipstick 2 NaN women NaN lipgloss 
+3
source

Source: https://habr.com/ru/post/1274428/


All Articles