Why is Scipy linear interpolation faster than nearest neighbor interpolation?

I wrote a procedure that interpolates point data on a regular grid. However, I find that the scipyimplementation of the nearest neighbor interpolation is nearly twice as slow as the radial base function that I use for linear interpolation ( scipy.interpolate.Rbf)

The corresponding code includes the construction of interpolators

if interpolation_mode == 'linear':
    interpolator = scipy.interpolate.Rbf(
        point_array[:, 0], point_array[:, 1], value_array,
        function='linear', smooth=.01)
elif interpolation_mode == 'nearest':
    interpolator = scipy.interpolate.NearestNDInterpolator(
        point_array, value_array)

And when the interpolation is called

result = interpolator(col_coords.ravel(), row_coords.ravel())

The sample in which I run has 27 input interpolation values, and I interpolate almost in the grid of 20,000 X 20,000. (I do this in the size of the memory block, so I don't crack the computer by the way.)

cProfile, . , 406 , 256 . scipy kdTree, , , rbf . , , ?

:

     25362 function calls in 225.886 seconds

   Ordered by: internal time
   List reduced from 328 to 10 due to restriction <10>

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
      253  169.302    0.669  207.516    0.820 C:\Python27\lib\site-packages\scipy\interpolate\rbf.py:112(
_euclidean_norm)
      258   38.211    0.148   38.211    0.148 {method 'reduce' of 'numpy.ufunc' objects}
      252    6.069    0.024    6.069    0.024 {numpy.core._dotblas.dot}
        1    5.077    5.077  225.332  225.332 C:\Python27\lib\site-packages\pygeoprocessing-0.3.0a8.post2
8+n5b1ee2de0d07-py2.7-win32.egg\pygeoprocessing\geoprocessing.py:333(interpolate_points_uri)
      252    1.849    0.007    2.137    0.008 C:\Python27\lib\site-packages\numpy\lib\function_base.py:32
85(meshgrid)
      507    1.419    0.003    1.419    0.003 {method 'flatten' of 'numpy.ndarray' objects}
     1268    1.368    0.001    1.368    0.001 {numpy.core.multiarray.array}
      252    1.018    0.004    1.018    0.004 {_gdal_array.BandRasterIONumPy}
        1    0.533    0.533  225.886  225.886 pygeoprocessing\tests\helper_driver.py:10(interpolate)
      252    0.336    0.001  216.716    0.860 C:\Python27\lib\site-packages\scipy\interpolate\rbf.py:225(
__call__)

:

     27539 function calls in 405.624 seconds

   Ordered by: internal time
   List reduced from 309 to 10 due to restriction <10>

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
      252  397.806    1.579  397.822    1.579 {method 'query' of 'ckdtree.cKDTree' objects}
      252    1.875    0.007    1.881    0.007 {scipy.interpolate.interpnd._ndim_coords_from_arrays}
      252    1.831    0.007    2.101    0.008 C:\Python27\lib\site-packages\numpy\lib\function_base.py:3285(meshgrid)
      252    1.034    0.004  400.739    1.590 C:\Python27\lib\site-packages\scipy\interpolate\ndgriddata.py:60(__call__)
        1    1.009    1.009  405.030  405.030 C:\Python27\lib\site-packages\pygeoprocessing-0.3.0a8.post28+n5b1ee2de0d07-py2.7-win32.egg\pygeoprocessing\geoprocessing.py:333(interpolate_points_uri)
      252    0.719    0.003    0.719    0.003 {_gdal_array.BandRasterIONumPy}
        1    0.509    0.509  405.624  405.624 pygeoprocessing\tests\helper_driver.py:10(interpolate)
      252    0.261    0.001    0.261    0.001 {numpy.core.multiarray.copyto}
       27    0.125    0.005    0.125    0.005 {_ogr.Layer_CreateFeature}
        1    0.116    0.116    0.254    0.254 C:\Python27\lib\site-packages\pygeoprocessing-0.3.0a8.post28+n5b1ee2de0d07-py2.7-win32.egg\pygeoprocessing\geoprocessing.py:362(_parse_point_data)

, .

Nearest

Linear

+4
1

griddata doc:

In [47]: def func(x, y):
          return x*(1-x)*np.cos(4*np.pi*x) * np.sin(4*np.pi*y**2)**2
   ....: 
In [48]: points = np.random.rand(1000, 2)
In [49]: values = func(points[:,0], points[:,1])
In [50]: grid_x, grid_y = np.mgrid[0:1:100j, 0:1:200j]

, 1000 20 000.

In [52]: timeit interpolate.griddata(points, values, (grid_x, grid_y),
    method='nearest')
10 loops, best of 3: 83.6 ms per loop

In [53]: timeit interpolate.griddata(points, values, (grid_x, grid_y),
    method='linear')
1 loops, best of 3: 24.6 ms per loop

In [54]: timeit interpolate.griddata(points, values, (grid_x, grid_y), 
    method='cubic')
10 loops, best of 3: 42.7 ms per loop

:

In [55]: %%timeit 
rbfi = interpolate.Rbf(points[:,0],points[:,1],values)
dl = rbfi(grid_x.ravel(),grid_y.ravel())
   ....: 
1 loops, best of 3: 3.89 s per loop

In [56]: %%timeit 
ndi=interpolate.NearestNDInterpolator(points, values)
dl=ndi(grid_x.ravel(),grid_y.ravel())
   ....: 
10 loops, best of 3: 82.6 ms per loop

In [57]: %%timeit 
ldi=interpolate.LinearNDInterpolator(points, values)
dl=ldi(grid_x.ravel(),grid_y.ravel())
 ....
10 loops, best of 3: 25.1 ms per loop

griddata .

griddata :

nearest
return the value at the data point closest to the point of
   interpolation. See NearestNDInterpolator for more details.
   Uses scipy.spatial.cKDTree

linear
tesselate the input point set to n-dimensional simplices, 
   and interpolate linearly on each simplex. 
   LinearNDInterpolator details are:
      The interpolant is constructed by triangulating the 
      input data with Qhull [R37], and on each triangle 
      performing linear barycentric interpolation.

cubic (2-D)
return the value determined from a piecewise cubic, continuously 
   differentiable (C1), and approximately curvature-minimizing 
   polynomial surface. See CloughTocher2DInterpolator for more details.

, cKTtree ; .

, , .

Rbf, , . , , , .

.

+2

Source: https://habr.com/ru/post/1609531/


All Articles