Accounting for errors when creating a histogram

Question

Accounting for errors when creating a histogram

I have a set of observations N distributed as (x[i], y[i]), i=0..N points in a 2D space. Each point has associated errors in both coordinates ( e_x[i], e_y[i], i=0..N ), as well as the weight attached to it ( w[i], i=0..N ).

I would like to create a two-dimensional histogram of these N points, taking into account not only the weights, but also the errors that could cause the distribution of each point, possibly among many boxes, if the error values are large enough (assuming a standard Gaussian distribution for errors, although it would be possible consider other distributions).

I see that numpy.histogram2d has a weights parameter, so take care of that. The problem is how to account for errors at each of the observed points N

Is there a feature that would allow me to do this? I am open to anything in numpy and scipy .

+5

python numpy scipy histogram

Gabriel Oct 6 '14 at 13:54

source share

1 answer

Gabriel gleizer · Accepted Answer · 2014-11-29T22:19:05+0000

Using the user1415946 comment, you can assume that each point represents a bivariate normal distribution with covariance matrices given by [[e_x[i]**2,0][0,e_y[i]**2]] . However, the resulting distribution is not a normal distribution - you will see after starting the example how the histogram is not at all similar to the Gaussian one, but instead represents a group.

To create a histogram from this set of distributions, one of the ways I see is to create random samples from each point using numpy.random.multivariate_normal . See the sample code below with some artificial data.

 import numpy as np from mpl_toolkits.mplot3d import Axes3D import matplotlib.pyplot as plt # This is a function I like to use for plotting histograms def plotHistogram3d(hist, xedges, yedges): fig = plt.figure() ax = fig.add_subplot(111, projection='3d') hist = hist.transpose() # Transposing is done so that bar3d x and y match hist shape correctly dx = np.mean(np.diff(xedges)) dy = np.mean(np.diff(yedges)) # Computing the number of elements elements = (len(xedges) - 1) * (len(yedges) - 1) # Generating mesh grids. xpos, ypos = np.meshgrid(xedges[:-1]+dx/2.0, yedges[:-1]+dy/2.0) # Vectorizing matrices xpos = xpos.flatten() ypos = ypos.flatten() zpos = np.zeros(elements) dx = dx * np.ones_like(zpos) * 0.5 # 0.5 factor to give room between bars. # Use 1.0 if you want all bars 'glued' to each other dy = dy * np.ones_like(zpos) * 0.5 dz = hist.flatten() ax.bar3d(xpos, ypos, zpos, dx, dy, dz, color='b') ax.set_xlabel('x') ax.set_ylabel('y') ax.set_zlabel('Count') return """ INPUT DATA """ # xy ex ey w data = np.array([[1, 2, 1, 1, 1], [3, 0, 1, 1, 2], [0, 1, 2, 1, 5], [7, 7, 1, 3, 1]]) """ Generate samples """ # Sample size (100 samples will be generated for each data point) SAMPLE_SIZE = 100 # I want to fill in a table with columns [x, y, w]. Each data point generates SAMPLE_SIZE # samples, so we have SAMPLE_SIZE * (number of data points) generated points points = np.zeros((SAMPLE_SIZE * data.shape[0], 3)) # Initializing this matrix for i, element in enumerate(data): # For each row in the data set meanVector = element[:2] covarianceMatrix = np.diag(element[2:4]**2) # Diagonal matrix with elements equal to error^2 # For columns 0 and 1, add generated x and y samples points[SAMPLE_SIZE*i:SAMPLE_SIZE*(i+1), :2] = \ np.random.multivariate_normal(meanVector, covarianceMatrix, SAMPLE_SIZE) # For column 2, simply copy original weight points[SAMPLE_SIZE*i:SAMPLE_SIZE*(i+1), 2] = element[4] # weights hist, xedges, yedges = np.histogram2d(points[:, 0], points[:, 1], weights=points[:, 2]) plotHistogram3d(hist, xedges, yedges) plt.show()

The results are presented below:

Accounting for errors when creating a histogram

More articles: