How to make beautiful unlimited geographical thematic / heatmaps with weighted (overview) data in R, probably using spatial smoothing on point observations

Question

How to make beautiful unlimited geographical thematic / heatmaps with weighted (overview) data in R, probably using spatial smoothing on point observations

Since Joshua Katz published these dialogues that can be found all over the Internet using a survey of Harvard dialects , I have tried to copy and summarize his methods .. but most of this is over my head. josh revealed some of his methods in this poster , but (as far as I know) did not reveal any of his codes.

My goal is to summarize these methods so that it is easier for users of any of the core survey data sets of the United States Government to translate their weighted data into a function and get a reasonable geographic map. Geography varies: some survey datasets have ZCTAs, some have counties, some have states, some have metro zones, etc. It is probably wise to start by building each point in the centroids - centroids are discussed here and is available for most geography 2010 census reference guide files . So, for each survey data point, you have a point on the map. but some survey responses have a weight of 10, others have a weight of 100,000! it is obvious that “warmth” or smoothing or coloring, which ultimately end on the map, must take into account different weights.

I am good at polling data, but I know nothing about spatial smoothing or kernel estimation. the method that Josh uses in his poster is k-nearest neighbor kernel smoothing with gaussian kernel , which is alien to me. I am new to matching, but I can make everything work if I know what the goal should be.

Note. This question is very similar to the question asked ten months ago, which no longer contains available data . There is also tid-bits of information in this stream, but if someone has a smart way to answer my exact question, I would most likely see it.

The polling package r has a svyplot function, and if you run these lines of code, you can see the weighted data by Cartesian coordinates. but in fact, for what I would like to do, the plot must be mapped.

 library(survey) data(api) dstrat<-svydesign(id=~1,strata=~stype, weights=~pw, data=apistrat, fpc=~fpc) svyplot(api00~api99, design=dstrat, style="bubble")

If using any of them, I posted some sample code that will give anyone who wants to help me quickly start with some survey data in statistical areas based on the kernel (another type of geography).

Any ideas, advice, recommendations will be appreciated (and enrolled if I can get a formal tutorial / guide / as written for http://asdfree.com/ )

thanks!!!!!!!!!!

 # load a few mapping libraries library(rgdal) library(maptools) library(PBSmapping) # specify some population data to download mydata <- "http://www.census.gov/popest/data/metro/totals/2012/tables/CBSA-EST2012-01.csv" # load mydata x <- read.csv( mydata , skip = 9 , h = F ) # keep only the GEOID and the 2010 population estimate x <- x[ , c( 'V1' , 'V6' ) ] # name the GEOID column to match the CBSA shapefile # and name the weight column the weight column! names( x ) <- c( 'GEOID10' , "weight" ) # throw out the bottom few rows x <- x[ 1:950 , ] # convert the weight column to numeric x$weight <- as.numeric( gsub( ',' , '' , as.character( x$weight ) ) ) # now just make some fake trinary data x$trinary <- c( rep( 0:2 , 316 ) , 0:1 ) # simple tabulation table( x$trinary ) # so now the `x` data file looks like this: head( x ) # and say we just wanted to map # something easy like # 0=red, 1=green, 2=blue, # weighted simply by the population of the cbsa # # # end of data read-in # # # # # # shapefile read-in? # # # # specify the tiger file to download tiger <- "ftp://ftp2.census.gov/geo/tiger/TIGER2010/CBSA/2010/tl_2010_us_cbsa10.zip" # create a temporary file and a temporary directory tf <- tempfile() ; td <- tempdir() # download the tiger file to the local disk download.file( tiger , tf , mode = 'wb' ) # unzip the tiger file into the temporary directory z <- unzip( tf , exdir = td ) # isolate the file that ends with ".shp" shapefile <- z[ grep( 'shp$' , z ) ] # read the shapefile into working memory cbsa.map <- readShapeSpatial( shapefile ) # remove CBSAs ending with alaska, hawaii, and puerto rico cbsa.map <- cbsa.map[ !grepl( "AK$|HI$|PR$" , cbsa.map$NAME10 ) , ] # cbsa.map$NAME10 now has a length of 933 length( cbsa.map$NAME10 ) # convert the cbsa.map shapefile into polygons.. cbsa.ps <- SpatialPolygons2PolySet( cbsa.map ) # but for some reason, cbsa.ps has 966 shapes?? nrow( unique( cbsa.ps[ , 1:2 ] ) ) # that seems wrong, but i'm not sure how to fix it? # calculate the centroids of each CBSA cbsa.centroids <- calcCentroid(cbsa.ps) # (ignoring the fact that i'm doing something else wrong..because there 966 shapes for 933 CBSAs?) # # # # # # as far as i can get w/ mapping # # # # # so now you've got # the weighted data file `x` with the `GEOID10` field # the shapefile with the matching `GEOID10` field # the centroids of each location on the map # can this be mapped nicely?

+5

r map nearest-neighbor smoothing survey

Anthony damico Apr 27 '14 at 1:06

source share

2 answers

this is my final answer, regis

http://www.asdfree.com/2014/12/maps-and-art-of-survey-weighted.html

+2

Anthony damico Dec 12 '14 at 11:24

source share

Paul regular · Accepted Answer · 2014-09-26T04:46:12+0000

I'm not sure how much I can help with spatial smoothing, since this is a task with which I have little experience, but I spent some time creating maps in R, so I hope that what I add below helps this part of your a question.

I started editing the code # # # shapefile read-in # # # ; You will notice that I saved the map in the SpatialPolygonsDataFrame class, and I relied on the raster and gstat to mesh and run spatial smoothing. The spatial smoothing model is the part that is the least convenient for me, but this process allowed me to make a bitmap image and demonstrate how to mask, design and build it.

 library(rgdal) library(raster) library(gstat) # read in a base map m <- getData("GADM", country="United States", level=1) m <- m[!m$NAME_1 %in% c("Alaska","Hawaii"),] # specify the tiger file to download tiger <- "ftp://ftp2.census.gov/geo/tiger/TIGER2010/CBSA/2010/tl_2010_us_cbsa10.zip" # create a temporary file and a temporary directory tf <- tempfile() ; td <- tempdir() # download the tiger file to the local disk download.file( tiger , tf , mode = 'wb' ) # unzip the tiger file into the temporary directory z <- unzip( tf , exdir = td ) # isolate the file that ends with ".shp" shapefile <- z[ grep( 'shp$' , z ) ] # read the shapefile into working memory cbsa.map <- readOGR( shapefile, layer="tl_2010_us_cbsa10" ) # remove CBSAs ending with alaska, hawaii, and puerto rico cbsa.map <- cbsa.map[ !grepl( "AK$|HI$|PR$" , cbsa.map$NAME10 ) , ] # cbsa.map$NAME10 now has a length of 933 length( cbsa.map$NAME10 ) # extract centroid for each CBSA cbsa.centroids <- data.frame(coordinates(cbsa.map), cbsa.map$GEOID10) names(cbsa.centroids) <- c("lon","lat","GEOID10") # add lat lon to popualtion data nrow(x) x <- merge(x, cbsa.centroids, by="GEOID10") nrow(x) # centroids could not be assigned to all records for some reason # create a raster object r <- raster(nrow=500, ncol=500, xmn=bbox(m)["x","min"], xmx=bbox(m)["x","max"], ymn=bbox(m)["y","min"], ymx=bbox(m)["y","max"], crs=proj4string(m)) # run inverse distance weighted model - modified code from ?interpolate...needs more research model <- gstat(id = "trinary", formula = trinary~1, weights = "weight", locations = ~lon+lat, data = x, nmax = 7, set=list(idp = 0.5)) r <- interpolate(r, model, xyNames=c("lon","lat")) r <- mask(r, m) # discard interpolated values outside the states # project map for plotting (optional) # North America Lambert Conformal Conic nalcc <- CRS("+proj=lcc +lat_1=20 +lat_2=60 +lat_0=40 +lon_0=-96 +x_0=0 +y_0=0 +ellps=GRS80 +datum=NAD83 +units=m +no_defs") m <- spTransform(m, nalcc) r <- projectRaster(r, crs=nalcc) # plot map par(mar=c(0,0,0,0), bty="n") cols <- c(rgb(0.9,0.8,0.8), rgb(0.9,0.4,0.3), rgb(0.8,0.8,0.9), rgb(0.4,0.6,0.9), rgb(0.8,0.9,0.8), rgb(0.4,0.9,0.6)) col.ramp <- colorRampPalette(cols) # custom colour ramp plot(r, axes=FALSE, legend=FALSE, col=col.ramp(100)) plot(m, add=TRUE) # overlay base map legend("right", pch=22, pt.bg=cols[c(2,4,6)], legend=c(0,1,2), bty="n")

How to make beautiful unlimited geographical thematic / heatmaps with weighted (overview) data in R, probably using spatial smoothing on point observations

More articles: