Interpolate unstructured X, Y, Z data over the best grid based on the closest neighboring distance for each point

Question

Interpolate unstructured X, Y, Z data over the best grid based on the closest neighboring distance for each point

This question has been edited after answers for the final show solution I used.

I have unstructured 2D datasets coming from different sources, for example, for example: These data sets are 3 numpy.ndarray (X, Y, and Z coordinates).

My ultimate goal is to interpolate abstract data on the grid for conversion to image / matrix. Therefore, I need to find the "best grid" for interpolating these abstracts. And for this I need to find the best X and Y steps between the pixels of this grid.

Determine the step based on the Euclidean distance between the points:

Use the average of the Euclidean distances between each point and the nearest neighbor.

Use KDTree / cKDTree from scipy.spacial to build the X, Y data tree.
Use the query method with k=2 to get the distances (if k=1 , the distances are zero, because the query is for every point found).

 # Generate KD Tree xy = np.c_[x, y] # X,Y data converted for use with KDTree tree = scipy.spacial.cKDTree(xy) # Create KDtree for X,Y coordinates. # Calculate step distances, points = tree.query(xy, k=2) # Query distances for X,Y points distances = distances[:, 1:] # Remove k=1 zero distances step = numpy.mean(distances) # Result

Performance tuning:

Using scipy.spatial.cKDTree , not scipy.spatial.KDTree , because it is really faster.
Use balanced_tree=False with scipy.spatial.cKDTree : Great speed in my case, but may be incorrect for all data.
Use n_jobs=-1 with cKDTree.query to use multithreading.
Use p=1 with cKDTree.query to use Manhattan distance instead of Euclidean distance ( p=2 ): faster, but may be less accurate.
Request distance only for random subsampling of points: High speed with large data sets, but may be less accurate and less repeatable.

Interpolate points on the grid:

Interpolate the dataset points on the grid using the calculated step.

 # Generate grid def interval(axe): '''Return numpy.linspace Interval for specified axe''' cent = axe.min() + axe.ptp() / 2 # Interval center nbs = np.ceil(axe.ptp() / step) # Number of step in interval hwid = nbs * step / 2 # Half interval width return np.linspace(cent - hwid, cent + hwid, nbs) # linspace xg, yg = np.meshgrid(interval(x), interval(y)) # Generate grid # Interpolate X,Y,Z datas on grid zg = scipy.interpolate.griddata((x, y), z, (xg, yg))

Set NaN if the pixel is too far from the starting points:

Set NaN to pixels from the grid that are too far (Distance> step) from points from the original data X, Y, Z. The previous KDTree created is used.

 # Calculate pixel to X,Y,Z data distances dist, _ = tree.query(np.c_[xg.ravel(), yg.ravel()]) dist = dist.reshape(xg.shape) # Set NaN value for too far pixels zg[dist > step] = np.nan

+5

python numpy scipy nearest-neighbor spatial-interpolation

user5002536 Jan 15 '16 at 13:39

source share

2 answers

The problem you want to solve is called the all-closest neighbors problem. See this article, for example: http://link.springer.com/article/10.1007/BF02187718

I believe that the solutions for this are O (N log N), therefore in the same order as KDTree.query, but in practice it is much faster than a bunch of separate queries. Sorry, I don't know about the python implementation of this.

+2

Alex I Jan 16 '16 at 3:54

source share

Antonio Ragagnin · Accepted Answer · 2016-01-15T15:10:12+0000

I suggest you go with KDTree.query .

You are looking for the distance from scamming to the scale of your binning: I suggest that you take only a random subset of your points and use the Manhattan distance, because KDTree.query very slow (and yet this is a * log (n) of complexity).

Here is my code:

 # CreateTree tree=scipy.spatial.KDTree(numpy.array(points)) # better give it a copy? # Create random subsample of points n_repr=1000 shuffled_points=numpy.array(points) numpy.random.shuffle(shuffled_points) shuffled_points=shuffled_points[:n_repr] # Query the tree (dists,points)=tree.query(shuffled_points,k=2,p=1) # Get _extimate_ of average distance: avg_dists=numpy.average(dists) print('average distance Manhattan with nearest neighbour is:',avg_dists)

I suggest you use the Manhattan distance ( https://en.wikipedia.org/wiki/Taxicab_geometry ) because it computes faster than the Euclidean distance. And since you only need an estimate of the average distance, it should be sufficient.

Interpolate unstructured X, Y, Z data over the best grid based on the closest neighboring distance for each point

More articles: