Accounting for errors when creating a histogram

I have a set of observations N distributed as (x[i], y[i]), i=0..N points in a 2D space. Each point has associated errors in both coordinates ( e_x[i], e_y[i], i=0..N ), as well as the weight attached to it ( w[i], i=0..N ).

I would like to create a two-dimensional histogram of these N points, taking into account not only the weights, but also the errors that could cause the distribution of each point, possibly among many boxes, if the error values ​​are large enough (assuming a standard Gaussian distribution for errors, although it would be possible consider other distributions).

I see that numpy.histogram2d has a weights parameter, so take care of that. The problem is how to account for errors at each of the observed points N

Is there a feature that would allow me to do this? I am open to anything in numpy and scipy .

+5
source share
1 answer

Using the user1415946 comment, you can assume that each point represents a bivariate normal distribution with covariance matrices given by [[e_x[i]**2,0][0,e_y[i]**2]] . However, the resulting distribution is not a normal distribution - you will see after starting the example how the histogram is not at all similar to the Gaussian one, but instead represents a group.

To create a histogram from this set of distributions, one of the ways I see is to create random samples from each point using numpy.random.multivariate_normal . See the sample code below with some artificial data.

 import numpy as np from mpl_toolkits.mplot3d import Axes3D import matplotlib.pyplot as plt # This is a function I like to use for plotting histograms def plotHistogram3d(hist, xedges, yedges): fig = plt.figure() ax = fig.add_subplot(111, projection='3d') hist = hist.transpose() # Transposing is done so that bar3d x and y match hist shape correctly dx = np.mean(np.diff(xedges)) dy = np.mean(np.diff(yedges)) # Computing the number of elements elements = (len(xedges) - 1) * (len(yedges) - 1) # Generating mesh grids. xpos, ypos = np.meshgrid(xedges[:-1]+dx/2.0, yedges[:-1]+dy/2.0) # Vectorizing matrices xpos = xpos.flatten() ypos = ypos.flatten() zpos = np.zeros(elements) dx = dx * np.ones_like(zpos) * 0.5 # 0.5 factor to give room between bars. # Use 1.0 if you want all bars 'glued' to each other dy = dy * np.ones_like(zpos) * 0.5 dz = hist.flatten() ax.bar3d(xpos, ypos, zpos, dx, dy, dz, color='b') ax.set_xlabel('x') ax.set_ylabel('y') ax.set_zlabel('Count') return """ INPUT DATA """ # xy ex ey w data = np.array([[1, 2, 1, 1, 1], [3, 0, 1, 1, 2], [0, 1, 2, 1, 5], [7, 7, 1, 3, 1]]) """ Generate samples """ # Sample size (100 samples will be generated for each data point) SAMPLE_SIZE = 100 # I want to fill in a table with columns [x, y, w]. Each data point generates SAMPLE_SIZE # samples, so we have SAMPLE_SIZE * (number of data points) generated points points = np.zeros((SAMPLE_SIZE * data.shape[0], 3)) # Initializing this matrix for i, element in enumerate(data): # For each row in the data set meanVector = element[:2] covarianceMatrix = np.diag(element[2:4]**2) # Diagonal matrix with elements equal to error^2 # For columns 0 and 1, add generated x and y samples points[SAMPLE_SIZE*i:SAMPLE_SIZE*(i+1), :2] = \ np.random.multivariate_normal(meanVector, covarianceMatrix, SAMPLE_SIZE) # For column 2, simply copy original weight points[SAMPLE_SIZE*i:SAMPLE_SIZE*(i+1), 2] = element[4] # weights hist, xedges, yedges = np.histogram2d(points[:, 0], points[:, 1], weights=points[:, 2]) plotHistogram3d(hist, xedges, yedges) plt.show() 

The results are presented below:

enter image description here

+1
source

Source: https://habr.com/ru/post/1204109/


All Articles