US county ggplot / mapping - problems with visualization forms in R

Question

US county ggplot / mapping - problems with visualization forms in R

So, I have a data frame in R called obesity_map, which basically gives me the state, county, and county obesity rate. It looks something like this:

obesity_map = data.frame(state, county, obesity_rate)

I am trying to visualize this on a map showing various rates of county obesity across the US:

 us.state.map <- map_data('state') head(us.state.map) states <- levels(as.factor(us.state.map$region)) df <- data.frame(region = states, value = runif(length(states), min=0, max=100),stringsAsFactors = FALSE) map.data <- merge(us.state.map, df, by='region', all=T) map.data <- map.data[order(map.data$order),] head(map.data) map.county <- map_data('county') county.obesity <- data.frame(region = obesity_map$state, subregion = obesity_map$county, value = obesity_map$obesity_rate) map.county <- merge(county.obesity, map.county, all=TRUE) ggplot(map.county, aes(x = long, y = lat, group=group, fill=as.factor(value))) + geom_polygon(colour = "white", size = 0.1)

And this basically creates an image that looks like this:

As you can see, the USA is divided into strange shapes, the colors are not one consistent color in different gradients, and you cannot make much of it. But what I really want is something like this below, but with every county filled:

I am new to this, so I will be grateful for any help!

Edit:

Here is the output of dput:

 dput(obesity_map)

Structure

(list (X = 1: 3141, FIPS = c (1L, 3L, 5L, 7L, 9L, 11L, 13L, 15L, 17L, 19L, 21L, 23L, 25L, 27L, 29L, 31L, 33L, 35L, 37L , 39L, 41L, 43L, 45L, 47L, 49L, 51L, 53L, 55L, 57L, 59L, 61L, 63L, 65L, 67L, 69L, 71L, 73L, 75L, 77L, 79L, 81L, 83L, 85L, 87L , 89L, 91L, 93L, 95L, 97L, 99L, 101L, 103L, 105L, 107L, 109L, 111L, 113L, 115L, 117L, 119L, 121L, 123L, 125L, 127L, 129L, 131L, 133L, 13L, 16L , 20L, 50L, 60L, 68L, 70L, 90L, 100L, 110L, 122L, 130L, 150L, 164L, 170L, 180L, 185L, 188L, 201L, 220L, 232L, 240L, 261L, 270L, 280L, 282L , 290L, 1L, 3L, 5L, 7L, 9L, 11L, 12L, 13L, 15L, 17L, 19L, 21L, 23L, 25L, 27L, 1L, 3L, 5L, 7L, 9L, 11L, 13L, 15L, 17L , 19L, 21L, 23L, 25L, 27L, 29L, 31L, 33L, 35L, 37L, 39L, 41L,

This is a huge number of numbers, because it is for each American district, so I reduced the results and put the first lines of the line.

Basically, a data frame looks like this:

 print(head(obesity_map)) X FIPS state_names county_names obesity 1 1 1 Alabama Autauga 24.5 2 2 3 Alabama Baldwin 23.6 3 3 5 Alabama Barbour 25.6 4 4 7 Alabama Bibb 0.0 5 5 9 Alabama Blount 24.2 6 6 11 Alabama Bullock 0.0

I also tried using ggcounty following the example, but I keep getting the error. I'm not quite sure what I did wrong:

 library(ggcounty) # breaks obesity_map$obese <- cut(obesity_map$obesity, breaks=c(0, 5, 10, 15, 20, 25, 30), labels=c("1", "2", "3", "4", "5", "6"), include.lowest=TRUE) # get the US counties map (lower 48) us <- ggcounty.us() # start the plot with our base map gg <- us$g # add a new geom with our population (choropleth) gg <- gg + geom_map(data=obesity_map, map=us$map, aes(map_id=FIPS, fill=obesity_map$obese), color="white", size=0.125)

But I always get the error message: "Error: argument must be forced on a non-negative integer"

Any idea? Thanks again for your help! I appreciate it very much.

+6

r ggplot2 tmap

user3648073 May 17, '14 at 17:10

source share

4 answers

jlhoward · Answer 1 · 2014-05-18T01:17:17+0000

So this is a similar example, but it is trying to accommodate the format of your obesity_map . It also uses a data table join, which is much faster than merge(...) , especially with large datasets like yours.

 library(ggplot2) # this creates an example formatted as your obesity.map - you have this already... set.seed(1) # for reproducible example map.county <- map_data('county') counties <- unique(map.county[,5:6]) obesity_map <- data.frame(state_names=counties$region, county_names=counties$subregion, obesity= runif(nrow(counties), min=0, max=100)) # you start here... library(data.table) # use data table merge - it *much* faster map.county <- data.table(map_data('county')) setkey(map.county,region,subregion) obesity_map <- data.table(obesity_map) setkey(obesity_map,state_names,county_names) map.df <- map.county[obesity_map] ggplot(map.df, aes(x=long, y=lat, group=group, fill=obesity)) + geom_polygon()+coord_map()

In addition, if your dataset has FIPS codes, which I think I highly recommend you use the TIGER / Line county shader file in the USA (which also has these codes) and merge with it. It is much more reliable. For example, in your extract from the obesity_map data frame, states and counties are capitalized, while they are not in the built-in county dataset in R, so you have to deal with this. In addition, the TIGER file is updated while the internal dataset is not.

So this is an interesting question. It turns out that the actual obesity data is on the USDA website and can be downloaded here as an MSExcel file. There is also a shape cap from US counties on the website of the Census Bureau, here . The Excel file and shapefile have FIPS information. In R, this can be compared relatively simply:

 library(XLConnect) # for loadWorkbook(...) and readWorksheet(...) library(rgdal) # for readOGR(...) library(RcolorBrewer) # for brewer.pal(...) library(data.table) setwd(" < directory with all your files > ") wb <- loadWorkbook("DataDownload.xls") # from the USDA website df <- readWorksheet(wb,"HEALTH") # this sheet has the obesity data US.counties <- readOGR(dsn=".",layer="gz_2010_us_050_00_5m") #leave out AK, HI, and PR (state FIPS: 02, 15, and 72) US.counties <- US.counties[!(US.counties$STATE %in% c("02","15","72")),] county.data <- US.counties@data county.data <- cbind(id=rownames(county.data),county.data) county.data <- data.table(county.data) county.data[,FIPS:=paste0(STATE,COUNTY)] # this is the state + county FIPS code setkey(county.data,FIPS) obesity.data <- data.table(df) setkey(obesity.data,FIPS) county.data[obesity.data,obesity:=PCT_OBESE_ADULTS10] map.df <- data.table(fortify(US.counties)) setkey(map.df,id) setkey(county.data,id) map.df[county.data,obesity:=obesity] ggplot(map.df, aes(x=long, y=lat, group=group, fill=obesity)) + scale_fill_gradientn("",colours=brewer.pal(9,"YlOrRd"))+ geom_polygon()+coord_map()+ labs(title="2010 Adult Obesity by Country, percent",x="",y="")+ theme_bw()

to produce this:

Martijn tennekes · Answer 2 · 2015-12-24T13:56:01+0000

Perhaps a little late for a different answer, but still worth sharing what I think.

Reading and data preprocessing are similar to jlhoward's answers with some differences:

 library(tmap) # package for plotting library(readxl) # for reading Excel library(maptools) # for unionSpatialPolygons # download data download.file("http://www.ers.usda.gov/datafiles/Food_Environment_Atlas/Data_Access_and_Documentation_Downloads/Current_Version/DataDownload.xls", destfile = "DataDownload.xls", mode="wb") df <- read_excel("DataDownload.xls", sheet = "HEALTH") # download shape (a little less detail than in the other scripts) f <- tempfile() download.file("http://www2.census.gov/geo/tiger/GENZ2010/gz_2010_us_050_00_20m.zip", destfile = f) unzip(f, exdir = ".") US <- read_shape("gz_2010_us_050_00_20m.shp") # leave out AK, HI, and PR (state FIPS: 02, 15, and 72) US <- US[!(US$STATE %in% c("02","15","72")),] # append data to shape US$FIPS <- paste0(US$STATE, US$COUNTY) US <- append_data(US, df, key.shp = "FIPS", key.data = "FIPS")

When the correct data is attached to the shape object, choropleth can be drawn with one line of code:

 qtm(US, fill = "PCT_OBESE_ADULTS10")

This can be enhanced by adding state borders, a better projection and title:

 # create shape object with state polygons US_states <- unionSpatialPolygons(US, IDs=US$STATE) tm_shape(US, projection="+init=epsg:2163") + tm_polygons("PCT_OBESE_ADULTS10", border.col = "grey30", title="") + tm_shape(US_states) + tm_borders(lwd=2, col = "black", alpha = .5) + tm_layout(title="2010 Adult Obesity by County, percent", title.position = c("center", "top"), legend.text.size=1)

Paulo E. Cardoso · Answer 3 · 2014-05-17T23:46:46+0000

This is what I can get by working with a control variable. Renaming it to "region".

 library(ggplot2) library(maps) m.usa <- map_data("county") m.usa$id <- m.usa$subregion m.usa <- m.usa[ ,-5] names(m.usa)[5] <- 'region' df <- data.frame(region = unique(m.usa$region), obesity = rnorm(length(unique(m.usa$region)), 50, 10), stringsAsFactors = F) head(df) region obesity 1 autauga 44.54833 2 baldwin 68.61470 3 barbour 52.19718 4 bibb 50.88948 5 blount 42.73134 6 bullock 59.93515 ggplot(df, aes(map_id = region)) + geom_map(aes(fill = obesity), map = m.usa) + expand_limits(x = m.usa$long, y = m.usa$lat) + coord_map()

Scott Worland · Answer 4 · 2015-09-22T19:50:26+0000

I think all you had to do was reorder the map.county variable as before for the map.data variable.

 .... map.county <- merge(county.obesity, map.county, all=TRUE) ## reorder the map before plotting map.county <- map.county[order(map.data$county),] ## plot ggplot(map.county, aes(x = long, y = lat, group=group, fill=as.factor(value))) + geom_polygon(colour = "white", size = 0.1)

US county ggplot / mapping - problems with visualization forms in R

More articles: