Product of array elements by groups in numpy (Python)

Question

Product of array elements by groups in numpy (Python)

I am trying to build a function that returns the products of subsets of array elements. Basically I want to create a prod_by_group function that does this:

 values = np.array([1, 2, 3, 4, 5, 6]) groups = np.array([1, 1, 1, 2, 3, 3]) Vprods = prod_by_group(values, groups)

And the resulting Vprods should be:

 Vprods array([6, 4, 30])

There is a big answer here for the sums of elements that I think should be similar to: https://stackoverflow.com/a/464829/

I tried to take log first, then sum_by_group , then exp , but ran into numerical problems.

Here are a few other similar answers for min and max elements by group: https://stackoverflow.com/a/464829/

Edit: Thanks for the quick answers! I try them. I have to add that I want it to be as fast as possible (which is the reason why I try to get it in numpy in some vectorized way, like the examples I gave).

Edit: I appreciated all the answers so far given, and the best one is below @seberg. Here's the full function that I ended up with:

 def prod_by_group(values, groups): order = np.argsort(groups) groups = groups[order] values = values[order] group_changes = np.concatenate(([0], np.where(groups[:-1] != groups[1:])[0] + 1)) return np.multiply.reduceat(values, group_changes)

+4

python numpy

Nate Nov 16 '12 at 19:48

source share

5 answers

First set up a mask for the groups in which you expand the groups in another dimension

 mask=(groups==unique(groups).reshape(-1,1)) mask array([[ True, True, True, False, False, False], [False, False, False, True, False, False], [False, False, False, False, True, True]], dtype=bool)

now multiply with val

 mask*val array([[1, 2, 3, 0, 0, 0], [0, 0, 0, 4, 0, 0], [0, 0, 0, 0, 5, 6]])

now you can already do prod along axis 1, except for those zeros that are easy to fix:

 prod(where(mask*val,mask*val,1),axis=1) array([ 6, 4, 30])

+1

cronos Nov 16 '12 at 20:03

source share

As stated in the comments, you can also use the Pandas module. Using grouby() , this task becomes single-line:

 import numpy as np import pandas as pd values = np.array([1, 2, 3, 4, 5, 6]) groups = np.array([1, 1, 1, 2, 3, 3]) df = pd.DataFrame({'values': values, 'groups': groups})

So df looks like this:

  groups values 0 1 1 1 1 2 2 1 3 3 2 4 4 3 5 5 3 6

Now you can groupby() the groups column and apply numpy prod() for each of the groups like this

  df.groupby(groups)['values'].apply(np.prod)

which gives the desired result:

 1 6 2 4 3 30

+1

Cleb Jun 30 '16 at 23:11

source share

Well, I doubt this is a great answer, but this is the best I can come up with:

 np.array([np.product(values[np.flatnonzero(groups == x)]) for x in np.unique(groups)])

0

mgilson Nov 16 '12 at 20:02

source share

This is not a countless solution, but it is pretty readable (I believe that sometimes numpy solutions are not!):

 from operator import itemgetter, mul from itertools import groupby grouped = groupby(zip(groups, values), itemgetter(0)) groups = [reduce(mul, map(itemgetter(1), vals), 1) for key, vals in grouped] print groups # [6, 4, 30]

0

Jon clements Nov 16 '12 at 20:07

source share

seberg · Accepted Answer · 2012-11-16T20:04:18+0000

If the groups are already sorted (if you cannot do this using np.argsort ), you can do this using the reduceat functionality to ufunc (if they are not sorted, you will have to sort them first to do this efficiently):

 # you could do the group_changes somewhat faster if you care a lot group_changes = np.concatenate(([0], np.where(groups[:-1] != groups[1:])[0] + 1)) Vprods = np.multiply.reduceat(values, group_changes)

Or mgilson answer if you have few groups. But if you have many groups, then this is much more effective. Since you avoid logical indexes for each element of the source array for each group. Also, you avoid slicing in python loop with shorthand.

Of course, pandas conveniently performs these operations.

Edit: Sorry, there is prod . The value of ufunc is multiply . You can use this method for any binary ufunc . This means that it works mainly for all numpy functions that can work with elements on two input arrays. (i.e., multiply usually multiplies two arrays by an element, adds adds them, maximum / minimum, etc. etc.)

Product of array elements by groups in numpy (Python)

More articles: