Create a heatmap with the distribution of attribute values ​​in R (not a dense heatmap)

Some of you may have seen Beyond Soda, Pop, or Coke . I faced a similar problem and would like to create such a plot. In my case, I have a very large number of geocoded observations (more than 1 million) and the binary attribute x. I would like to show the distribution of x on a map with a color scale from 0 to 1 for p (x = 1).

I am open to other approaches, but Katz's approach for Beyond "Soda, Pop or Coke" is described here and uses these packages: fields, maps, mapproj, plyr, RANN, RColorBrewer, scales and zipcode. His approach is based on smoothing the core of the k-nearest neighbor with the Gaussian core. First, it determines the distance for each location t on the map to all observations and then uses the distance-weighted estimate for p (x = 1 | t) (the probability that x will be 1 conditional in place). The formula is here .

When I understand this correctly, creating such a map in R involves the following steps:

  • Create a mesh that covers the entire shapefile area (let me name the points in the mesh t). I tried this approach using polygrid but still failed. The code is below.
  • For each t, ​​calculate the distance to all observations (or just find the k closest points and calculate the distance for this subset)
  • calculate p (x = 1 | t) according to the formula defined here
  • display all t with the corresponding color scale, which is in the range from 0 to 1

Here are some sample data , and I have two specific questions. First, how to solve my problem from step 1? As the second map below shows, my current approach fails. This is a clear question about implementing R, and as soon as this is resolved, I will be able to complete the other steps. The second and broader is that the right approach or you propose another way to create a heat map with the distribution of attribute values?

load libraries and open shapefile and packages

 # set path path = PATH # CHANGE THIS!! # load libraries library("stringr") library("rgdal") library("maptools") library("maps") library("RANN") library("fields") library("plyr") library("geoR") library("ggplot2") # open shapefile map.proj = CRS(" +proj=lcc +lat_1=40.66666666666666 +lat_2=41.03333333333333 +lat_0=40.16666666666666 +lon_0=-74 +x_0=300000 +y_0=0 +datum=NAD83 +units=us-ft +no_defs +ellps=GRS80 +towgs84=0,0,0") proj4.longlat=CRS("+proj=longlat +ellps=GRS80") shape = readShapeSpatial(str_c(path,"test-shape"),proj4string=map.proj) shape = spTransform(shape, proj4.longlat) # open data df=readRDS(str_c(path,"df.rds")) 

chart data

 # plot shapefile with points par (mfrow=c(1,1),mar=c(0,0,0,0), cex=0.8, cex.lab=0.8, cex.main=0.8, mgp=c(1.2,0.15,0), cex.axis=0.7, tck=-0.02,bg = "white") plot( shape@bbox [1,], shape@bbox [2,],type='n',asp=1,axes=FALSE,xlab="",ylab="") with(subset(df,attr==0),points(lon,lat,pch=20,col="#303030",bg="#303030",cex=0.4)) with(subset(df,attr==1),points(lon,lat,pch=20,col="#E16A3F",bg="#E16A3F",cex=0.4)) plot(shape,add=TRUE,border="black",lwd=0.2) 

enter image description here

1) Create a grid that covers the entire shapefile area

 # get the bounding box for ROI an convert to a list bboxROI = apply(bbox(shape), 1, as.list) # create a sequence from min(x) to max(x) in each dimension seqs = lapply(bboxROI, function(x) seq(x$min, x$max, by= 0.001)) # rename to xgrid and ygrid names(seqs) <- c('xgrid','ygrid') # get borders of entire SpatialPolygonsDataFrame borders = rbind.fill.matrix(llply( shape@polygons ,function(p1) { rbind.fill.matrix(llply( p1@Polygons ,function(p2) p2@coords )) })) # create grid thegrid = do.call(polygrid,c(seqs, borders = list(borders))) # add grid points to previous plot points(thegrid[,1],thegrid[,2],pch=20,col="#33333333",bg="#33333333",cex=0.4) 

enter image description here

+4
source share

Source: https://habr.com/ru/post/976114/


All Articles