How does matplotlib histogramdd work?

Question

How does matplotlib histogramdd work?

I find the output of histogramdd confusing. For instance:

h, edges = histogramdd([[1,2,1],[4,2,1]],bins=2) h -> [[ 1. 1.] [ 1. 0.]] edges -> [array([ 1. , 1.5, 2. ]), array([ 1. , 2.5, 4. ])]

Perhaps I do not understand the documentation, but it seems that the input should be an array with N rows representing the data of points and columns D representing dimensions (therefore, in this case we are dealing with two data points in three dimensions), and I assume that each array of edges represents a different dimension, but that doesn't seem to make sense based on the output of h .

How should this be interpreted?

thanks

+4

python numpy histogram

Robert Smith Nov 03 '12 at 1:49

source share

1 answer

Robert Smith · Accepted Answer · 2012-11-06T04:06:35+0000

UPDATE

Last time I made a mistake. Now this is the correct interpretation of histogramdd. First of all, it is very important to use an array in histogramdd, otherwise it will produce false results:

Compare this:

 In [59]: h, edges = histogramdd([[1,2,4],[4,2,8],[3,2,1],[2,1,2],[2,1,3],[2,1,1],[2,1,4]],bins=3) h.shape Out[59]: (3, 3, 3, 3, 3, 3, 3)

:

 In [60]: h, edges = histogramdd(array([[1,2,4],[4,2,8],[3,2,1],[2,1,2],[2,1,3],[2,1,1],[2,1,4]]),bins=3) h.shape Out[60]: (3, 3, 3)

Using the second approach, we get reasonable results:

 In [61]: h, edges = histogramdd(array([[1,2,4],[4,2,8],[3,2,1],[2,1,2],[2,1,3],[2,1,1],[2,1,4]]),bins=3) In [64]: h Out[64]: array([[[ 0., 0., 0.], [ 0., 0., 0.], [ 0., 1., 0.]], [[ 3., 1., 0.], [ 0., 0., 0.], [ 0., 0., 0.]], [[ 0., 0., 0.], [ 0., 0., 0.], [ 1., 0., 1.]]]) In [65]: edges Out[65]: [array([ 1., 2., 3., 4.]), array([ 1. , 1.33333333, 1.66666667, 2. ]), array([ 1. , 3.33333333, 5.66666667, 8. ])]

Our entry is [1,2,4], [4,2,8], etc edges represent cells for each dimension. In this example, [1,2,4] calculated as follows: 1 belongs to the first box of the array([1.,2.,3.,4.]) , Because it belongs to the third cell of array([ 1. , 1.33333333, 1.66666667, 2. ]) between 1 and 2, 2) array([ 1. , 1.33333333, 1.66666667, 2. ]) , because between 1.6666667 and 2 and 4 the second bit of the array([ 1. , 3.33333333, 5.66666667, 8. ]) , because it is between 3.33333333 and 5.66666667. So, we have the first bit, the third bit and the second bit for the coordinates of the point [1,2,4] . This means that we are counting this element in the first array, third row, second column:

 [[ 0., 0., 0.], [ 0., 0., 0.], [ 0., 1*., 0.]]

I added * so you can more easily identify it. The second coordinate [4,2,8] is located in the third hopper, third hopper and third hopper for x, y, z, respectively (third array, third row, third column):

 [[ 0., 0., 0.], [ 0., 0., 0.], [ 1., 0., 1.*]]])

As a final example, the third coordinate [3,2,1] is located in the third hopper, third hopper and first hopper for x, y, z, respectively (third array, third row, first column):

 [[ 0., 0., 0.], [ 0., 0., 0.], [ 1.*, 0., 1.]]

How does matplotlib histogramdd work?

More articles: