How to replace a column of a pure number with a dict keyword number? [Python]

I have a dataframe and dict below, but how do I replace a column with a dict?

data index occupation_code 0 10 1 16 2 12 3 7 4 1 5 3 6 10 7 7 8 1 9 3 10 4 …… dict1 = {0: 'other',1: 'academic/educator',2: 'artist',3: 'clerical/admin',4: 'college/grad student',5: 'customer service',6: 'doctor/health care',7: 'executive/managerial',8: 'farmer',9: 'homemaker',10: 'K-12student',11: 'lawyer',12: 'programmer',13: 'retired',14: 'sales/marketing',15: 'scientist',16: 'self-employed',17: 'technician/engineer',18: 'tradesman/craftsman',19: 'unemployed',20: 'writer'} 

I used the "for" clause to replace, but it is very slow, for example:

 for i in data.index: data.loc[i,'occupation_detailed'] = dict1[data.loc[i,'occupation_code']] 

Since my data contains 1 million rows, and it costs a few seconds if I run it only 1 thousand times. 1 million lines can cost half a day!

So, is there a better way to do this?

Thanks so much for the tips!

+5
source share
2 answers

Use map , and if some value is missing, get NaN :

 print (df) occupation_code index 0 10 1 16 2 12 3 7 4 1 5 3 6 10 7 7 8 1 9 3 10 4 11 100 <- add missing value 100 

 df['occupation_code'] = df['occupation_code'].map(dict1) print (df) occupation_code index 0 K-12student 1 self-employed 2 programmer 3 executive/managerial 4 academic/educator 5 clerical/admin 6 K-12student 7 executive/managerial 8 academic/educator 9 clerical/admin 10 college/grad student 11 NaN 

Another solution is to use replace , if some values ​​are missing, get the original value, no NaN :

 df['occupation_code'] = df['occupation_code'].replace(dict1) print (df) occupation_code index 0 K-12student 1 self-employed 2 programmer 3 executive/managerial 4 academic/educator 5 clerical/admin 6 K-12student 7 executive/managerial 8 academic/educator 9 clerical/admin 10 college/grad student 11 100 
+7
source

Assuming sample data @jezrael df

 print(df) occupation_code index 0 10 1 16 2 12 3 7 4 1 5 3 6 10 7 7 8 1 9 3 10 4 11 100 

I would recommend using the get method of the dictionary built into lambda . This allows you to insert a default value for things not included in the dictionary. In this case, I return the original value.

 df.occupation_code.map(lambda x: dict1.get(x, x)) index 0 K-12student 1 self-employed 2 programmer 3 executive/managerial 4 academic/educator 5 clerical/admin 6 K-12student 7 executive/managerial 8 academic/educator 9 clerical/admin 10 college/grad student 11 100 Name: occupation_code, dtype: object 
+1
source

Source: https://habr.com/ru/post/1267051/


All Articles