Part of a raster cell covered by one or more polygons: is there a faster way to do this (in R)?

Images are better than words, so please take a look at this example image.

I have

  • RasterLayer object (filled with random values ​​here for illustrative purposes only, actual values ​​do not matter)
  • SpatialPolygons object with lots and lots of polygons in it

You can recreate the example data that I used for the image with the following code:

library(sp) library(raster) library(rgeos) # create example raster r <- raster(nrows=10, ncol=15, xmn=0, ymn=0) values(r) <- sample(x=1:1000, size=150) # create example (Spatial) Polygons p1 <- Polygon(coords=matrix(c(50, 100, 100, 50, 50, 15, 15, 35, 35, 15), nrow=5, ncol=2), hole=FALSE) p2 <- Polygon(coords=matrix(c(77, 123, 111, 77, 43, 57, 66, 43), nrow=4, ncol=2), hole=FALSE) p3 <- Polygon(coords=matrix(c(110, 125, 125, 110, 67, 75, 80, 67), nrow=4, ncol=2), hole=FALSE) lots.of.polygons <- SpatialPolygons(list(Polygons(list(p1, p2, p3), 1))) crs(lots.of.polygons) <- crs(r) # copy crs from raster to polygons (please ignore any potential problems related to projections etc. for now) # plot both plot(r) #values in this raster for illustration purposes only plot(lots.of.polygons, add=TRUE) 

For each raster cell, I want to know how many of them are covered by one or more polygons. Or actually: the area of ​​all polygons inside the raster cell, without what is outside the cell in question. If there are several polygons overlapping a cell, I need their common area.

The following code does what I want, but takes more than a week to work with the actual datasets:

 # empty the example raster (don't need the values): values(r) <- NA # copy of r that will hold the results r.results <- r for (i in 1:ncell(r)){ r.cell <- r # fresh copy of the empty raster r.cell[i] <- 1 # set the ith cell to 1 p <- rasterToPolygons(r.cell) # create a polygon that represents the i-th raster cell cropped.polygons <- gIntersection(p, lots.of.polygons) # intersection of i-th raster cell and all SpatialPolygons if (is.null(cropped.polygons)) { r.results[i] <- NA # if there no polygon intersecting this raster cell, just return NA ... } else{ r.results[i] <- gArea(cropped.polygons) # ... otherwise return the area } } plot(r.results) plot(lots.of.polygons, add=TRUE) 

I can squeeze out a bit more speed using sapply instead of for -loop, but the bottleneck seems to be somewhere else. The whole approach looks rather uncomfortable, and I wonder if I missed something obvious. At first I thought that rasterize() should be able to do this easily, but I cannot figure out what to add to the argument fun= . Any ideas?

+5
source share
3 answers
[edit]

Perhaps gIntersection(..., byid = T) with gUnaryUnion(lots.of.polygons) (they allow you to process all cells at once) is faster than for a loop (if gUnaryUnion() takes too much time, this is a bad idea).

 r <- raster(nrows=10, ncol=15, xmn=0, ymn=0) set.seed(1); values(r) <- sample(x=1:1000, size=150) rr <- rasterToPolygons(r) # joining intersecting polys and put all polys into single SpatialPolygons lots.of.polygons <- gUnaryUnion(lots.of.polygons) # in this example, it is unnecessary gi <- gIntersection(rr, lots.of.polygons, byid = T) ind <- as.numeric(do.call(rbind, strsplit(names(gi), " "))[,1]) # getting intersected rr id r[] <- NA r[ind] <- sapply( gi@polygons , function(x) slot(x, 'area')) # a bit faster than gArea(gi, byid = T) plot(r) plot(lots.of.polygons, add=TRUE) 

enter image description here

+3
source

you can parallelize your loop using doSNOW and foreach packages. This will speed up the calculation of the number of your processors.

 library(doSNOW) library(foreach) cl <- makeCluster(4) # 4 is the number of CPUs used. You can change that according # to the number of processors you have registerDoSNOW(cl) values(r.results) <- foreach(i = 1:ncell(r), .packages = c("raster", "sp", "rgeos"), .combine = c) %dopar% { r.cell <- r # fresh copy of the empty raster r.cell[i] <- 1 # set the ith cell to 1 p <- rasterToPolygons(r.cell) # create a polygon that represents the i-th raster cell cropped.polygons <- gIntersection(p, lots.of.polygons) # intersection of i-th raster cell and all SpatialPolygons if (is.null(cropped.polygons)) { NA # if there no polygon intersecting this raster cell, just return NA ... } else{ gArea(cropped.polygons) # ... otherwise return the area } } plot(r.results) plot(lots.of.polygons, add=TRUE) 
+2
source

As you mentioned in your question, an alternative would be to use rasterization to speed things up. This will be associated with the creation of two raster files: one “fishnet” raster with values ​​corresponding to cell numbers, and one with values ​​corresponding to polygon identifiers. Both should be “resampled” to a higher resolution than the original cell raster. Then you can calculate how many supersampling grid cells with the same cell number correspond to polygon raster cells with a valid (non-zero) identifier. In practice, something like this will work (note that I slightly changed the construction of the input polygons to SpatialPolygonsDataFrame .

 library(sp) library(raster) library(rgeos) library(data.table) library(gdalUtils) # create example raster r <- raster(nrows=10, ncol=15, xmn=0, ymn=0) values(r) <- sample(x=1:1000, size=150) # create example (Spatial) Polygons --> Note that I changed it slightly # to have a SpatialPolygonsDataFrame with IDs for the different polys p1 <- Polygons(list(Polygon(coords=matrix(c(50, 100, 100, 50, 50, 15, 15, 35, 35, 15), nrow=5, ncol=2), hole=FALSE)), "1") p2 <- Polygons(list(Polygon(coords=matrix(c(77, 123, 111, 77, 43, 57, 66, 43), nrow=4, ncol=2), hole=FALSE)), "2") p3 <- Polygons(list(Polygon(coords=matrix(c(110, 125, 125, 110, 67, 75, 80, 67), nrow=4, ncol=2), hole=FALSE)), "3") lots.of.polygons <- SpatialPolygons(list(p1, p2, p3), 1:3) lots.of.polygons <- SpatialPolygonsDataFrame(lots.of.polygons, data = data.frame (id = c(1,2,3))) crs(lots.of.polygons) <- crs(r) # copy crs from raster to polygons (please ignore any potential problems related to projections etc. for now) # plot both plot(r) #values in this raster for illustration purposes only plot(lots.of.polygons, add = TRUE) # Create a spatial grid dataframe and convert it to a "raster fishnet" # Consider also that creating a SpatialGridDataFrame could be faster # than using "rasterToPolygons" in your original approach ! cs <- res(r) # cell size. cc <- c(extent(r)@xmin,extent(r)@ymin) + (cs/2) # corner of the grid. cd <- ceiling(c(((extent(r)@xmax - extent(r)@xmin)/cs[1]), # construct grid topology ((extent(r)@ymax - extent(r)@ymin)/cs[2]))) - 1 # Define grd characteristics grd <- GridTopology(cellcentre.offset = cc, cellsize = cs, cells.dim = cd) #transform to spatial grid dataframe. each cell has a sequential numeric id sp_grd <- SpatialGridDataFrame(grd, data = data.frame(id = seq(1,(prod(cd)),1)), # ids are numbers between 1 and ns*nl proj4string = crs(r) ) # Save the "raster fishnet" out_raster <- raster(sp_grd) %>% setValues( sp_grd@data $id) temprast <- tempfile(tmpdir = tempdir(), fileext = ".tif") writeRaster(out_raster, temprast, overwrite = TRUE) # "supersample" the raster of the cell numbers ss_factor = 20 # this indicates how much you increase resolution of the "cells" raster # the higher this is, the lower the error in computed percentages temprast_hr <- tempfile(tmpdir = tempdir(), fileext = ".tif") super_raster <- gdalwarp(temprast, temprast_hr, tr = res(r)/ss_factor, output_Raster = TRUE, overwrite = TRUE) # Now rasterize the input polygons with same extent and resolution of super_raster tempshapefile <- writeOGR(obj = lots.of.polygons, dsn="tempdir", layer="tempshape", driver="ESRI Shapefile") temprastpoly <- tempfile(tmpdir = tempdir(), fileext = ".tif") rastpoly <- gdal_rasterize(tempshapefile, temprastpoly, tr = raster::res(super_raster), te = extent(super_raster)[c(1,3,2,4)], a = 'id', output_Raster = TRUE) # Compute Zonal statistics: for each "value" of the supersampled fishnet raster, # compute the number of cells which have a non-zero value in the supersampled # polygons raster (ie, they belong to one polygon), and divide by the maximum # possible of cells (equal to ss_factor^2) cell_nos <- getValues(super_raster) polyid <- getValues(rastpoly) rDT <- data.table(polyid_fc = as.numeric(polyid), cell_nos = as.numeric(cell_nos)) setkey(rDT, cell_nos) # Use data.table to quickly summarize over cell numbers count <- rDT[, lapply(.SD, FUN = function(x, na.rm = TRUE) { 100*length(which(x > 0))/(ss_factor^2) }, na.rm = na.rm), by = cell_nos] # Put the results back in the SpatialGridDataFrame and plot sp_grd@data <- data.frame(count) sp_grd$polyid_fc[sp_grd$polyid_fc == 0] <- NA spplot(sp_grd, zcol = 'polyid_fc') 

enter image description here

This should be very fast and scale very well with the number of polygons.

The caveat is that you have to deal with approximation in calculated percentages! The corrected error depends on how much you "supersample" the raster (here the value "20" of the ss_factor variable is ss_factor ). Higher supersampling coefficients lead to lower errors, but to greater memory requirements and processing time.

I also thought that a way to speed up vector-based approaches could be to do an a priori analysis of the distances between raster cells and different polygons, which allows you to search only for intersections between cells and "nearby" polygons. Perhaps you could use bboxes of polygons to look for interesting cells ....

Hth,

Lorenzo

0
source

Source: https://habr.com/ru/post/1259917/


All Articles