Consider the following data frames d1andd1
d1 = pd.DataFrame([
[1, 2, 3],
[2, 3, 4],
[3, 4, 5],
[1, 2, 3],
[2, 3, 4],
[3, 4, 5]
], columns=list('ABC'))
d2 = pd.get_dummies(list('XYZZXY'))
d1
A B C
0 1 2 3
1 2 3 4
2 3 4 5
3 1 2 3
4 2 3 4
5 3 4 5
d2
X Y Z
0 1 0 0
1 0 1 0
2 0 0 1
3 0 0 1
4 1 0 0
5 0 1 0
I need to get a new framework with an object with multiple indices that has the product of each combination of columns from d1andd2
So far I have done this ...
from itertools import product
pd.concat({(x, y): d1[x] * d2[y] for x, y in product(d1, d2)}, axis=1)
A B C
X Y Z X Y Z X Y Z
0 1 0 0 2 0 0 3 0 0
1 0 2 0 0 3 0 0 4 0
2 0 0 3 0 0 4 0 0 5
3 0 0 1 0 0 2 0 0 3
4 2 0 0 3 0 0 4 0 0
5 0 3 0 0 4 0 0 5 0
There is nothing wrong with this method. But I am looking for alternatives to evaluate.
Inspired by Yakim Pirozhenko
m, n = len(d1.columns), len(d2.columns)
lvl0 = np.repeat(np.arange(m), n)
lvl1 = np.tile(np.arange(n), m)
v1, v2 = d1.values, d2.values
pd.DataFrame(
v1[:, lvl0] * v2[:, lvl1],
d1.index,
pd.MultiIndex.from_tuples(list(zip(d1.columns[lvl0], d2.columns[lvl1])))
)
However, this is the more awkward implementation of the numpy broadcast, which is best covered by Divakar.
Timing
All answers were good answers and demonstrated various aspects of pandas and numpy. Please consider them if you find them useful and informative.
%%timeit
m, n = len(d1.columns), len(d2.columns)
lvl0 = np.repeat(np.arange(m), n)
lvl1 = np.tile(np.arange(n), m)
v1, v2 = d1.values, d2.values
pd.DataFrame(
v1[:, lvl0] * v2[:, lvl1],
d1.index,
pd.MultiIndex.from_tuples(list(zip(d1.columns[lvl0], d2.columns[lvl1])))
)
%%timeit
vals = (d2.values[:,None,:] * d1.values[:,:,None]).reshape(d1.shape[0],-1)
cols = pd.MultiIndex.from_product([d1.columns, d2.columns])
pd.DataFrame(vals, columns=cols, index=d1.index)
%timeit d1.apply(lambda x: d2.mul(x, axis=0).stack()).unstack()
%timeit pd.concat({x : d2.mul(d1[x], axis=0) for x in d1.columns}, axis=1)
%timeit pd.concat({(x, y): d1[x] * d2[y] for x, y in product(d1, d2)}, axis=1)
1000 loops, best of 3: 663 ยตs per loop
1000 loops, best of 3: 624 ยตs per loop
100 loops, best of 3: 3.38 ms per loop
1000 loops, best of 3: 860 ยตs per loop
100 loops, best of 3: 2.01 ms per loop