PANDAS vlookup versus general index map series

Question

PANDAS vlookup versus general index map series

import pandas as pd
import numpy as np

pb = {"mark_up_id":{"0":"123","1":"456","2":"789","3":"111","4":"222"},"mark_up":{"0":1.2987,"1":1.5625,"2":1.3698,"3":1.3333,"4":1.4589}}

data = {"id":{"0":"K69","1":"K70","2":"K71","3":"K72","4":"K73","5":"K74","6":"K75","7":"K79","8":"K86","9":"K100"},"cost":{"0":29.74,"1":9.42,"2":9.42,"3":9.42,"4":9.48,"5":9.48,"6":24.36,"7":5.16,"8":9.8,"9":3.28},"mark_up_id":{"0":"123","1":"456","2":"789","3":"111","4":"222","5":"333","6":"444","7":"555","8":"666","9":"777"}}

pb = pd.DataFrame(data=pb).set_index('mark_up_id')
df = pd.DataFrame(data=data)

I know I can use something like:

df['mark_up_id'].map(pb['mark_up'])

to do v-look-up. I would like to mark this profit and multiply it by each value using a common index in order to get a new column called the price.

I know that I can combine the two, and then start the calculation. This is how I produced the desired result. I would like it to look like you are scrolling through a dictionary and using keys to look up values in another dictionary and doing some kind of calculation inside the loop. Given that PANDAS dataframes are located on top of dictionaries, there should be a way to use the join / map / apply combination for this, without actually joining two datasets in memory.

Required Conclusion:

desired_output = {"cost":{"0":29.74,"1":9.42,"2":9.42,"3":9.42,"4":9.48},"id":{"0":"K69","1":"K70","2":"K71","3":"K72","4":"K73"},"mark_up_id":{"0":"123","1":"456","2":"111","3":"123","4":"789"},"price":{"0":38.623338,"1":14.71875,"2":12.559686,"3":12.233754,"4":12.985704}}
do = pd.DataFrame(data=desired_output)

Bonus points:

...

pb.loc[df['mark_up_id']]['mark_up'] * df.set_index('mark_up_id')['cost']

-, , ...

df.apply(lambda x : x['cost']*pb.loc[x['mark_up_id']],axis=1 )

:

KeyError: ('the label [333] is not in the [index]', u'occurred at index 5')

+4

python pandas

Yale Newman 13 . '17 21:45

4

: :

In [79]: df = df.assign(price=df['mark_up_id'].map(pb['mark_up']) * df['cost']).dropna()

In [80]: df
Out[80]:
    cost   id mark_up_id      price
0  29.74  K69        123  38.623338
1   9.42  K70        456  14.718750
2   9.42  K71        789  12.903516
3   9.42  K72        111  12.559686
4   9.48  K73        222  13.830372

:

In [67]: df = df.assign(price=df['mark_up_id'].map(pb['mark_up']) * df['cost'])

In [68]: df
Out[68]:
    cost   id mark_up_id      price
0  29.74  K69        123  38.623338
1   9.42  K70        456  14.718750
2   9.42  K71        111  12.559686
3   9.42  K72        123  12.233754
4   9.48  K73        789  12.985704

+2

MaxU 13 . '17 21:49

Using merge

df=df.merge(df1,left_on='mark_up_id',right_index=True)
df.assign(price=df['cost'].mul(df['mark_up'])).drop('mark_up',1)
Out[254]: 
    cost   id mark_up_id      price
0  29.74  K69        123  38.623338
3   9.42  K72        123  12.233754
1   9.42  K70        456  14.718750
2   9.42  K71        111  12.559686
4   9.48  K73        789  12.985704

If you want applyu lambda: it's ugly ..... for real ...

df.apply(lambda x : x['cost']*df1.loc[x['mark_up_id']],axis=1 )

Change to (Even uglier ... T_T)

df.apply(lambda x :x['cost']*df1.loc[x['mark_up_id']] if pd.Series(x['mark_up_id']).isin(df1.index)[0] else np.nan,axis=1 )

+2

Wen Sep 13 '17 at 10:02

source share

df['price'] = df['cost'] * df['mark_up_id'].map(pb['markup'])

dfwill now be your desired exit.

0

chet-the-wizard Sep 13 '17 at 21:52

source share

Vaishali · Accepted Answer · 2017-09-13T21:49:39+0000

Try

df['price'] = df['mark_up_id'].map(pb['mark_up']) * df['cost']

    cost    id  mark_up_id  price
0   29.74   K69 123         38.623338
1   9.42    K70 456         14.718750
2   9.42    K71 111         12.559686
3   9.42    K72 123         12.233754
4   9.48    K73 789         12.985704

PANDAS vlookup versus general index map series

More articles: