Pandas groupby with calculation, sum and average

I have the following DF in pandas:

+---------+--------+--------------------+ | keyword | weight | other keywords | +---------+--------+--------------------+ | dog | 0.12 | [cat, horse, pig] | | cat | 0.5 | [dog, pig, camel] | | horse | 0.07 | [dog, camel, cat] | | dog | 0.1 | [cat, horse] | | dog | 0.2 | [cat, horse , pig] | | horse | 0.3 | [camel] | +---------+--------+--------------------+ 

The task I want to accomplish is grouping by keyword and at the same time counting the frequency of keywords, averaging by weight and summing by other keywords. The result will be something like this:

 +---------+-----------+------------+------------------------------------------------+ | keyword | frequency | avg weight | sum other keywords | +---------+-----------+------------+------------------------------------------------+ | dog | 3 | 0.14 | [cat, horse, pig, cat, horse, cat, horse, pig] | | cat | 1 | 0.5 | [dog, pig, camel] | | horse | 2 | 0.185 | [dog, camel, cat, camel] | +---------+-----------+------------+------------------------------------------------+ 

Now I know how to do this in many separate operations: value_counts, groupby.sum (), groupby.avg (), and then merge. However, this is very inefficient, and I have to do a lot of manual settings.

I am wondering if this can be done in one operation?

+5
source share
1 answer

You can use agg :

 df = df.groupby('keyword').agg({'keyword':'size', 'weight':'mean', 'other keywords':'sum'}) #set new ordering of columns df = df.reindex_axis(['keyword','weight','other keywords'], axis=1) #reset index df = df.rename_axis(None).reset_index() #set new column names df.columns = ['keyword','frequency','avg weight','sum other keywords'] print (df) keyword frequency avg weight \ 0 cat 1 0.500 1 dog 3 0.140 2 horse 2 0.185 sum other keywords 0 [dog, pig, camel] 1 [cat, horse, pig, cat, horse, cat, horse, pig] 2 [dog, camel, cat, camel] 
+8
source

Source: https://habr.com/ru/post/1264976/


All Articles