Groupby.transform not working in dask dataframe

I am using the following dask.dataframe file AID:

   AID FID  ANumOfF
0    1   X        1
1    1   Y        5
2    2   Z        6
3    2   A        1
4    2   X       11
5    2   B       18

I know in pandas a data frame that I could use:

AID.groupby('AID')['ANumOfF'].transform('sum')

To obtain:

0     6
1     6
2    36
3    36
4    36
5    36

I want to use the same with dask.dataframes, which usually uses the same functions as pandas dataframe, but in this case gives me the following error:

AttributeError: 'SeriesGroupBy' object has no attribute 'transform'

It can be either one of two things, or that dask does not support it, or because I use python 3?

I tried the following code:

AID.groupby('AID')['ANumOfF'].sum()

but it just gives me the sum of each group as follows:

AID
1     6
2    36

I need this to be as above, where the sum is repeated on each line. My question is: if the conversion is not supported, is there any other way to achieve the same result?

+4
1

, join:

s = AID.groupby('AID')['ANumOfF'].sum()
AID = AID.set_index('AID').drop('ANumOfF', axis=1).join(s).reset_index()
print (AID)
   AID FID  ANumOfF
0    1   X        6
1    1   Y        6
2    2   Z       36
3    2   A       36
4    2   X       36
5    2   B       36

map Series dict:

s = AID.groupby('AID')['ANumOfF'].sum()
#a bit faster
#s = AID.groupby('AID')['ANumOfF'].sum().to_dict()
AID['ANumOfF'] = AID['AID'].map(s)
print (AID)
   AID FID  ANumOfF
0    1   X        6
1    1   Y        6
2    2   Z       36
3    2   A       36
4    2   X       36
5    2   B       36
+5

Source: https://habr.com/ru/post/1673984/


All Articles