I have a df like:
material plant Order
24990 89952 4568789,5098710
24990 89952 9448609,1007081
166621 3062 18364103
166621 3062 78309139
240758 3062 55146035
276009 3062 38501581,857542
and df1 as:
material plant Order m1 m2 m3 m4 m5
24990 89952 4568789 0.123 0.214 0.0 0.0 0.0
24990 89952 5098710 1.000 0.363 0.0 0.0 0.0
24990 89952 9448609 0.0 0.345 0.0 1.0 0.0
24990 89952 1007081 0.0 0.756 0.0 1.0 0.0
166621 3062 18364103 0.0 0.0 0.0 0.0 0.0
166621 3062 78309139 0.0 1.0 0.0 0.0 0.0
240758 3062 55146035 1.0 1.0 1.0 0.0 0.0
276009 3062 38501581 1.0 1.0 1.0 0.0 0.0
276009 3062 38575428 1.0 1.0 1.0 0.0 0.0
I want to iterate through Order in df1, and when there is an Order match in df2, find the average value from m1 to m5. I want to achieve df2 as:
material plant Order avg m1 avgm2 avgm3 avgm4 avgm5
24990 89952 4568789,5098710 0.5615 0.2885 0.0 0.0 0.0
24990 89952 9448609,1007081
166621 3062 18364103
166621 3062 78309139
240758 3062 55146035
276009 3062 38501581,857542
I am trying to use different ways to achieve df2, for example:
df2 = (df.groupby(df1, sort=False)['Order'].apply(lambda x: ','.split(x.astype(str)))
.mean()
.reset_index()
.reindex(columns=df.columns))
print (df2)
second:
group = df.columns[np.r_[0:3, 3:len(df.columns)]]
res = df1.groupby(group)['Order'].apply(list).mean().reset_index()
But I'm not sure if this is the right way to get it.
source
share