Combine unequal data and replace the missing rows with 0

I have two data.frames, one with only characters and the other with characters and values.

df1 = data.frame(x=c('a', 'b', 'c', 'd', 'e')) df2 = data.frame(x=c('a', 'b', 'c'),y = c(0,1,0)) merge(df1, df2) xy 1 a 0 2 b 1 3 c 0 

I want to combine df1 and df2. The characters a, b, and c merge well, and also have 0, 1, 0, but d and e have nothing. I want d and e to also be in the merge table with condition 0 0. Thus, for each missing row in df2 data.frame, the value 0 should be placed in table df1, for example:

  xy 1 a 0 2 b 1 3 c 0 4 d 0 5 e 0 
+45
merge r dataframe
May 11 '11 at 14:15
source share
4 answers

Take a look at the merge help page. The all parameter allows you to specify various types of merges. Here we want to set all = TRUE . This will return the NA merge for values ​​that do not match, which we can update to 0 with is.na() :

 zz <- merge(df1, df2, all = TRUE) zz[is.na(zz)] <- 0 > zz xy 1 a 0 2 b 1 3 c 0 4 d 0 5 e 0 
+69
May 11 '11 at 14:21
source share

Or, as an alternative to @ Chase code, being a recent plyr fan with a background in databases:

 require(plyr) zz<-join(df1, df2, type="left") zz[is.na(zz)] <- 0 
+7
May 11 '11 at 14:52
source share

Another alternative with a data table.

DATA EXAMPLE

 dt1 <- data.table(df1) dt2 <- data.table(df2) setkey(dt1,x) setkey(dt2,x) 

THE CODE

 dt2[dt1,list(y=ifelse(is.na(y),0,y))] 
+2
May 11 '11 at 20:11
source share

I used the answer Chase received (answered May 11, 2011 at 14:21), but I added some code to apply this solution to my specific problem.

I had a frame of bets (user, download) and a frame of totals (user, download), which should be combined by the user, and I wanted to include each speed, even if there was no corresponding amount. However, there can be no missing results, and in this case, the choice of rows for replacing NA by zero will fail.

The first line of code performs the merge. The next two lines change the column names in the merged frame. The if statement replaces NA with zero, but only if there are lines with NA.

 # merge rates and totals, replacing absent totals by zero graphdata <- merge(rates, totals, by=c("user"),all.x=T) colnames(graphdata)[colnames(graphdata)=="download.x"] = "download.rate" colnames(graphdata)[colnames(graphdata)=="download.y"] = "download.total" if(any(is.na(graphdata$download.total))) { graphdata[is.na(graphdata$download.total),]$download.total <- 0 } 
+2
Mar 27 '14 at 4:36
source share



All Articles