I have some experimental data that exist like this:
x = array([1, 1.12, 1.109, 2.1, 3, 4.104, 3.1, ...])
y = array([-9, -0.1, -9.2, -8.7, -5, -4, -8.75, ...])
z = array([10, 4, 1, 4, 5, 0, 1, ...])
If this is convenient, we can assume that the data exists as a 3D array or even pandas DataFrame:
df = pd.DataFrame({'x': x, 'y': y, 'z': z})
The interpretation, which is for each position x[i], y[i], the value of some variable z[i]. These are not uniformly sampled , so there will be some parts that are “tightly selected” (for example, between 1 and 1.2 in x) and others that are very sparse (for example, between 2 and 3 in x). Because of this, I cannot just insert them in pcolormeshor contourf.
Instead, I would like to make a resample xand yevenly at some fixed interval, and then aggregate the values z. For my needs, zyou can summarize or average to get meaningful values, so this is not a problem. My naive attempt was this:
X = np.arange(min(x), max(x), 0.1)
Y = np.arange(min(y), max(y), 0.1)
x_g, y_g = np.meshgrid(X, Y)
nx, ny = x_g.shape
z_g = np.full(x_g.shape, np.nan)
for ix in range(nx - 1):
for jx in range(ny - 1):
x_min = x_g[ix, jx]
x_max = x_g[ix + 1, jx + 1]
y_min = y_g[ix, jx]
y_max = y_g[ix + 1, jx + 1]
vals = df[(df.x >= x_min) & (df.x < x_max) &
(df.y >= y_min) & (df.y < y_max)].z.values
if vals.any():
z_g[ix, jx] = sum(vals)
It works, and I get the desired result with help plt.contourf(x_g, y_g, z_g), but it is SLOW! I have ~ 20k samples, which then I select into ~ 800 samples in x and ~ 500 by y, that is, the for loop is 400k.
Is there a way to vectorize / optimize this? Even better if there is some function that already does this!
(Also marking this as MATLAB, because the syntax between numpy / MATLAB is very similar, and I have access to both programs.)