Pandas: use groupby for each list item

Perhaps I am missing the obvious.

I have a pandas framework that looks like this:

id product categories 0 Silmarillion ['Book', 'Fantasy'] 1 Headphones ['Electronic', 'Material'] 2 Dune ['Book', 'Sci-Fi'] 

I would like to use the groupby function to count the number of occurrences of each item in the category column, so here the result will be

 Book 2 Fantasy 1 Electronic 1 Material 1 Sci-Fi 1 

However, when I try to use the groupby function, pandas counts the occurrences of the entire list instead of separating its elements. I tried several different ways to handle this using tuples or partitions, but so far I have not been successful.

+5
source share
3 answers

You can normalize records by stacking them, and then call value_counts() :

 pd.DataFrame(df['categories'].tolist()).stack().value_counts() Out: Book 2 Fantasy 1 Material 1 Sci-Fi 1 Electronic 1 dtype: int64 
+5
source

You can also call pd.value_counts directly in the list.
You can create an appropriate list via numpy.concatenate , itertools.chain or cytoolz.concat

 from cytoolz import concat from itertools import chain 

cytoolz.concat

 pd.value_counts(list(concat(df.categories.values.tolist()))) 

itertools.chain

 pd.value_counts(list(chain(*df.categories.values.tolist()))) 

numpy.unique + numpy.concatenate

 u, c = np.unique(np.concatenate(df.categories.values), return_counts=True) pd.Series(c, u) 

Whole exit

 Book 2 Electronic 1 Fantasy 1 Material 1 Sci-Fi 1 dtype: int64 

time testing

enter image description here

+5
source

try the following:

 In [58]: df['categories'].apply(pd.Series).stack().value_counts() Out[58]: Book 2 Fantasy 1 Electronic 1 Sci-Fi 1 Material 1 dtype: int64 
+4
source

Source: https://habr.com/ru/post/1263138/


All Articles