How to replace a column of a pure number with a dict keyword number? [Python]

Question

How to replace a column of a pure number with a dict keyword number? [Python]

I have a dataframe and dict below, but how do I replace a column with a dict?

data index occupation_code 0 10 1 16 2 12 3 7 4 1 5 3 6 10 7 7 8 1 9 3 10 4 …… dict1 = {0: 'other',1: 'academic/educator',2: 'artist',3: 'clerical/admin',4: 'college/grad student',5: 'customer service',6: 'doctor/health care',7: 'executive/managerial',8: 'farmer',9: 'homemaker',10: 'K-12student',11: 'lawyer',12: 'programmer',13: 'retired',14: 'sales/marketing',15: 'scientist',16: 'self-employed',17: 'technician/engineer',18: 'tradesman/craftsman',19: 'unemployed',20: 'writer'}

I used the "for" clause to replace, but it is very slow, for example:

 for i in data.index: data.loc[i,'occupation_detailed'] = dict1[data.loc[i,'occupation_code']]

Since my data contains 1 million rows, and it costs a few seconds if I run it only 1 thousand times. 1 million lines can cost half a day!

So, is there a better way to do this?

Thanks so much for the tips!

+5

python dictionary pandas

Ricky Apr 22 '17 at 14:49

source share

2 answers

Assuming sample data @jezrael df

 print(df) occupation_code index 0 10 1 16 2 12 3 7 4 1 5 3 6 10 7 7 8 1 9 3 10 4 11 100

I would recommend using the get method of the dictionary built into lambda . This allows you to insert a default value for things not included in the dictionary. In this case, I return the original value.

 df.occupation_code.map(lambda x: dict1.get(x, x)) index 0 K-12student 1 self-employed 2 programmer 3 executive/managerial 4 academic/educator 5 clerical/admin 6 K-12student 7 executive/managerial 8 academic/educator 9 clerical/admin 10 college/grad student 11 100 Name: occupation_code, dtype: object

+1

piRSquared Apr 22 '17 at 16:14

source share

jezrael · Accepted Answer · 2017-04-22T14:51:26+0000

Use map , and if some value is missing, get NaN :

 print (df) occupation_code index 0 10 1 16 2 12 3 7 4 1 5 3 6 10 7 7 8 1 9 3 10 4 11 100 <- add missing value 100

 df['occupation_code'] = df['occupation_code'].map(dict1) print (df) occupation_code index 0 K-12student 1 self-employed 2 programmer 3 executive/managerial 4 academic/educator 5 clerical/admin 6 K-12student 7 executive/managerial 8 academic/educator 9 clerical/admin 10 college/grad student 11 NaN

Another solution is to use replace , if some values are missing, get the original value, no NaN :

 df['occupation_code'] = df['occupation_code'].replace(dict1) print (df) occupation_code index 0 K-12student 1 self-employed 2 programmer 3 executive/managerial 4 academic/educator 5 clerical/admin 6 K-12student 7 executive/managerial 8 academic/educator 9 clerical/admin 10 college/grad student 11 100

How to replace a column of a pure number with a dict keyword number? [Python]

More articles: