Pandas: use groupby for each list item

Question

Pandas: use groupby for each list item

Perhaps I am missing the obvious.

I have a pandas framework that looks like this:

id product categories 0 Silmarillion ['Book', 'Fantasy'] 1 Headphones ['Electronic', 'Material'] 2 Dune ['Book', 'Sci-Fi']

I would like to use the groupby function to count the number of occurrences of each item in the category column, so here the result will be

 Book 2 Fantasy 1 Electronic 1 Material 1 Sci-Fi 1

However, when I try to use the groupby function, pandas counts the occurrences of the entire list instead of separating its elements. I tried several different ways to handle this using tuples or partitions, but so far I have not been successful.

+5

python python-3.x numpy pandas pandas-groupby

Skum Jan 21 '17 at 13:59

source share

3 answers

You can also call pd.value_counts directly in the list.
You can create an appropriate list via numpy.concatenate , itertools.chain or cytoolz.concat

 from cytoolz import concat from itertools import chain

cytoolz.concat

 pd.value_counts(list(concat(df.categories.values.tolist())))

itertools.chain

 pd.value_counts(list(chain(*df.categories.values.tolist())))

numpy.unique + numpy.concatenate

 u, c = np.unique(np.concatenate(df.categories.values), return_counts=True) pd.Series(c, u)

Whole exit

 Book 2 Electronic 1 Fantasy 1 Material 1 Sci-Fi 1 dtype: int64

time testing

+5

piRSquared Jan 21 '17 at 14:25

source share

try the following:

 In [58]: df['categories'].apply(pd.Series).stack().value_counts() Out[58]: Book 2 Fantasy 1 Electronic 1 Sci-Fi 1 Material 1 dtype: int64

+4

Maxu Jan 21 '17 at 14:06

source share

ayhan · Accepted Answer · 2017-01-21T14:06:52+0000

You can normalize records by stacking them, and then call value_counts() :

 pd.DataFrame(df['categories'].tolist()).stack().value_counts() Out: Book 2 Fantasy 1 Material 1 Sci-Fi 1 Electronic 1 dtype: int64

Pandas: use groupby for each list item

More articles: