I am doing data cleansing. I use mutate in Dplyr a lot, as it creates new columns step by step, and I can easily understand how this happens.
Here are two examples where I have this error
Error: incompatible size (%d), expecting %d (the group size) or 1
Example 1: Get the name of a city from a zip code. The data is just like this:
Zip 1 02345 2 02201
And I notice when it has NA, it doesn't work.
Without NA, this works:
library(dplyr) library(zipcode) data(zipcode) test = data.frame(Zip=c('02345','02201'),stringsAsFactors=FALSE) test %>% rowwise() %>% mutate( Town1 = zipcode[zipcode$zip==na.omit(Zip),'city'] )
as a result
Source: local data frame [2 x 2] Groups: <by row> Zip Town1 1 02345 Manomet 2 02201 Boston
With NA, this does not work:
library(dplyr) library(zipcode) data(zipcode) test = data.frame(Zip=c('02345','02201',NA),stringsAsFactors=FALSE) test %>% rowwise() %>% mutate( Town1 = zipcode[zipcode$zip==na.omit(Zip),'city'] )
as a result
Error: incompatible size (%d), expecting %d (the group size) or 1
Example 2. I want to get rid of the redundant state that occurs in the Town column in the following data.
Town State 1 BOSTON MA MA 2 NORTH AMAMS MA 3 CHICAGO IL IL
Here's how I do it: (1) splits a string in Town into words, for example. "BOSTON" and "MA" for line 1. (2) see if any of these words matches the state of this line (3) delete matching words
library(dplyr) test = data.frame(Town=c('BOSTON MA','NORTH AMAMS','CHICAGO IL'), State=c('MA','MA','IL'), stringsAsFactors=FALSE) test %>% mutate(Town.word = strsplit(Town, split=' ')) %>% rowwise() %>%
This leads to:
Town State Town.word is.state Town1 1 BOSTON MA MA <chr[2]> 2 BOSTON 2 NORTH AMAMS MA <chr[2]> NA NA 3 CHICAGO IL IL <chr[2]> 2 CHICAGO
Meaning: For example, line 1 shows is.state == 2, that is, the second word in Town is the name of the state. After getting rid of this work, Town1 is the correct name for the city.
Now I want to fix NA in line 2, but adding na.omit will result in an error:
test %>% mutate(Town.word = strsplit(Town, split=' ')) %>% rowwise() %>%
leads to:
Error: incompatible size (%d), expecting %d (the group size) or 1
I checked the type and size of the data:
test %>% mutate(Town.word = strsplit(Town, split=' ')) %>% rowwise() %>%
leads to:
Town State Town.word is.state length(is.state) class(na.omit(is.state)) 1 BOSTON MA MA <chr[2]> 2 1 integer 2 NORTH AMAMS MA <chr[2]> NA 1 integer 3 CHICAGO IL IL <chr[2]> 2 1 integer
So this is% d of length == 1. Can anyone where wrong? Thanks