Find an easier way to group 2-D scatter data into grid array data

Question

Find an easier way to group 2-D scatter data into grid array data

I have computed a method for grouping dispersed point data into a structured 2-dimensional array (for example, a rasterization function ). And I hope that there are some better ways to achieve this.

My job

1. Introduction

1000 data points have property sizes (lon, lat, emission) that represent one factory located in (x, y), emit a certain amount of CO2 into the atmosphere
mesh grid : predefine 2-dimensional array in shape 20x20

http://i4.tietuku.com/02fbaf32d2f09fff.png

The code reproduced here:

#### define the map area
xc1,xc2,yc1,yc2 = 113.49805889531724,115.5030664238035,37.39995194888143,38.789235929357105       
map = Basemap(llcrnrlon=xc1,llcrnrlat=yc1,urcrnrlon=xc2,urcrnrlat=yc2)     

#### reading the point data and scatter plot by their position
df = pd.read_csv("xxxxx.csv")
px,py = map(df.lon, df.lat)       
map.scatter(px, py, color = "red", s= 5,zorder =3)      

#### predefine the grid networks      
lon_grid,lat_grid = np.linspace(xc1,xc2,21), np.linspace(yc1,yc2,21)
lon_x,lat_y = np.meshgrid(lon_grid,lat_grid)
grids = np.zeros(20*20).reshape(20,20)
plt.pcolormesh(lon_x,lat_y,grids,cmap =  'gray', facecolor = 'none',edgecolor = 'k',zorder=3)

2. My goal

Finding the closest grid point for each factory
Add emission data to this grid number

3. The implementation of the algorithm

3.1 Raster grid

note : 20x20 grid points are distributed in this area represented by a blue dot.

http://i4.tietuku.com/8548554587b0cb3a.png

3.2 KD tree

Find the nearest blue dot of each red dot

sh = (20*20,2)
grids = np.zeros(20*20*2).reshape(*sh)

sh_emission = (20*20)
grids_em = np.zeros(20*20).reshape(sh_emission)

k = 0
for j in range(0,yy.shape[0],1):
    for i in range(0,xx.shape[0],1):
        grids[k] = np.array([lon_grid[i],lat_grid[j]])
        k+=1

T = KDTree(grids)

x_delta = (lon_grid[2] - lon_grid[1])
y_delta = (lat_grid[2] - lat_grid[1])
R = np.sqrt(x_delta**2 + y_delta**2)

for i in range(0,len(df.lon),1):
    idx = T.query_ball_point([df.lon.iloc[i],df.lat.iloc[i]], r=R)
    # there are more than one blue dot which are founded sometimes,      
    # So I'll calculate the distances between the factory(red point)       
    # and all blue dots which are listed 
    if (idx > 1):
        distance = []
        for k in range(0,len(idx),1):
            distance.append(np.sqrt((df.lon.iloc[i] - grids[k][0])**2 + (df.lat.iloc[i] - grids[k][1])**2))
           pos_index = distance.index(min(distance))
           pos = idx[pos_index]

    # Only find 1 point
    else:
         pos = idx   
    grids_em[pos] += df.so2[i]

4. Result

co2 = grids_em.reshape(20,20)
plt.pcolormesh(lon_x,lat_y,co2,cmap =plt.cm.Spectral_r,zorder=3)

http://i4.tietuku.com/6ded65c4ac301294.png

5. My question

Can someone point out some flaws or errors of this method?
Are there any algorithms closer to my goal?

Thanks a lot!

+2

python arrays numpy matplotlib matplotlib-basemap

Han zhengzu Jan 08 '16 at 2:50

source share

1 answer

HYRY · Accepted Answer · 2016-01-08T11:31:38+0000

There are many for-loops in your code, this is not numpy.

First enter some sample data:

import numpy as np
import pandas as pd
from scipy.spatial import KDTree
import pylab as pl

xc1, xc2, yc1, yc2 = 113.49805889531724, 115.5030664238035, 37.39995194888143, 38.789235929357105       

N = 1000
GSIZE = 20
x, y = np.random.multivariate_normal([(xc1 + xc2)*0.5, (yc1 + yc2)*0.5], [[0.1, 0.02], [0.02, 0.1]], size=N).T
value = np.ones(N)

df_points = pd.DataFrame({"x":x, "y":y, "v":value})

For equal spatial grids you can use hist2d():

pl.hist2d(df_points.x, df_points.y, weights=df_points.v, bins=20, cmap="viridis");

Here is the result:

KdTree:

X, Y = np.mgrid[x.min():x.max():GSIZE*1j, y.min():y.max():GSIZE*1j]

grid = np.c_[X.ravel(), Y.ravel()]
points = np.c_[df_points.x, df_points.y]

tree = KDTree(grid)
dist, indices = tree.query(points)

grid_values = df_points.groupby(indices).v.sum()

df_grid = pd.DataFrame(grid, columns=["x", "y"])
df_grid["v"] = grid_values

fig, ax = pl.subplots(figsize=(10, 8))
ax.plot(df_points.x, df_points.y, "kx", alpha=0.2)
mapper = ax.scatter(df_grid.x, df_grid.y, c=df_grid.v, 
                    cmap="viridis", 
                    linewidths=0, 
                    s=100, marker="o")
pl.colorbar(mapper, ax=ax);

:

Find an easier way to group 2-D scatter data into grid array data

My job

1. Introduction

2. My goal

3. The implementation of the algorithm

4. Result

5. My question

More articles: