This question has been edited after answers for the final show solution I used.
I have unstructured 2D datasets coming from different sources, for example, for example:
These data sets are 3 numpy.ndarray (X, Y, and Z coordinates).
My ultimate goal is to interpolate abstract data on the grid for conversion to image / matrix. Therefore, I need to find the "best grid" for interpolating these abstracts. And for this I need to find the best X and Y steps between the pixels of this grid.
Determine the step based on the Euclidean distance between the points:
Use the average of the Euclidean distances between each point and the nearest neighbor.
- Use
KDTree / cKDTree from scipy.spacial to build the X, Y data tree. - Use the
query method with k=2 to get the distances (if k=1 , the distances are zero, because the query is for every point found).
# Generate KD Tree xy = np.c_[x, y] # X,Y data converted for use with KDTree tree = scipy.spacial.cKDTree(xy) # Create KDtree for X,Y coordinates. # Calculate step distances, points = tree.query(xy, k=2) # Query distances for X,Y points distances = distances[:, 1:] # Remove k=1 zero distances step = numpy.mean(distances) # Result
Performance tuning:
- Using
scipy.spatial.cKDTree , not scipy.spatial.KDTree , because it is really faster. - Use
balanced_tree=False with scipy.spatial.cKDTree : Great speed in my case, but may be incorrect for all data. - Use
n_jobs=-1 with cKDTree.query to use multithreading. - Use
p=1 with cKDTree.query to use Manhattan distance instead of Euclidean distance ( p=2 ): faster, but may be less accurate. - Request distance only for random subsampling of points: High speed with large data sets, but may be less accurate and less repeatable.
Interpolate points on the grid:
Interpolate the dataset points on the grid using the calculated step.
# Generate grid def interval(axe): '''Return numpy.linspace Interval for specified axe''' cent = axe.min() + axe.ptp() / 2
Set NaN if the pixel is too far from the starting points:
Set NaN to pixels from the grid that are too far (Distance> step) from points from the original data X, Y, Z. The previous KDTree created is used.
# Calculate pixel to X,Y,Z data distances dist, _ = tree.query(np.c_[xg.ravel(), yg.ravel()]) dist = dist.reshape(xg.shape)