Most efficient way to match dict dict based on two pandas columns

Question

Most efficient way to match dict dict based on two pandas columns

I have the following problem: I would like to match dict dictbased on two columns in pandas dataframe. However, the only solution I've come up with so far is use apply. The problem is that my dataframe has over a million rows, so usage applycan be long. Any ideas on how to do this more efficiently? Here is my code:

import pandas as pd
import numpy as np

dict_dict = {'A': {'a': 1, 'b': 2, 'c': 3},
             'B': {'a': 4, 'b': 5, 'c': 6},
             'C': {'a': 7, 'b': 8, 'c': 9},
             'D': {'a': 10, 'b': 11, 'c': 12}}

list1 = ['A', 'B', 'C']
list2 = ['a', 'b', 'c']

np.random.seed(100)

df = pd.DataFrame()
df['col1'] = np.random.choice(list1, 10)
df['col2'] = np.random.choice(list2, 10)

df['map'] = df.apply(lambda x: dict_dict[x.col1][x.col2], axis=1)

df

  col1 col2  map
0    A    c    3
1    A    c    3
2    A    b    2
3    C    a    7
4    C    a    7
5    A    a    1
6    C    a    7
7    B    c    6
8    C    a    7
9    C    b    8

+4

python numpy pandas

Eric B May 19, '17 at 17:52

source share

1 answer

root · Accepted Answer · 2017-05-19T18:02:12+0000

You can build a DataFrame from dict_dictand use merge:

# Construct a DataFrame from dict_dict
df2 = pd.DataFrame(dict_dict).stack().rename('map').to_frame()

# Perform a merge.
df = df.merge(df2, how='left', left_on=['col2', 'col1'], right_index=True)

Result:

  col1 col2  map
0    A    c    3
1    A    c    3
2    A    b    2
3    C    a    7
4    C    a    7
5    A    a    1
6    C    a    7
7    B    c    6
8    C    a    7
9    C    b    8

Most efficient way to match dict dict based on two pandas columns

More articles: