Replace a value in a data frame with a value from another data frame based on a set of conditions

In df1, I need to replace the values ​​for msec with the corresponding values ​​in df2.

df1 <- data.frame(ID=c('rs', 'rs', 'rs', 'tr','tr','tr'), cond=c(1,1,2,1,1,2), block=c(2,2,4,2,2,4), correct=c(1,0,1,1,1,0), msec=c(456,678,756,654,625,645)) df2 <- data.frame(ID=c('rs', 'rs', 'tr','tr'), cond=c(1,2,1,2), block=c(2,4,2,4), mean=c(545,664,703,765)) 

In df1, if correct==0 , then specify df2 with the corresponding ID , cond and block values. Replace the value of msec in df1 with the corresponding value for mean in df2 .

For example, the second line in df1 has correct==0 . So, in df2 find the corresponding line, where ID=='rs' , cond==1 , block==2 and use the value for the average value ( mean=545 ) to replace the value for msec ( msec=678 ). Please note that in df1 combinations of ID, block and cond can be repeated, but each combination occurs only once in df2.

+5
source share
3 answers

Using the data.table package:

 # load the 'data.table' package library(data.table) # convert the data.frame to data.table's setDT(df1) setDT(df2) # update df1 by reference with a join with df2 df1[df2[, correct := 0], on = .(ID, cond, block, correct), msec := i.mean] 

which gives:

 > df1 ID cond block correct msec 1: rs 1 2 1 456 2: rs 1 2 0 545 3: rs 2 4 1 756 4: tr 1 2 1 654 5: tr 1 2 1 625 6: tr 2 4 0 765 

Note. The above code will update df1 instead of creating a new data framework that is more memory efficient.

+3
source

One option is to use the R base with interaction() and a match() . What about:

 df1[which(df1$correct==0),"msec"] <- df2[match(interaction(df1[which(df1$correct==0),c("ID","cond","block")]), interaction(df2[,c("ID","cond", "block")])), "mean"] df1 # ID cond block correct msec #1 rs 1 2 1 456 #2 rs 1 2 0 545 #3 rs 2 4 1 756 #4 tr 1 2 1 654 #5 tr 1 2 1 625 #6 tr 2 4 0 765 

We overwrite the columns correct == 0 with their associated rows in df2$mean

Edit: Another option would be sql merging, which might look like this:

 library(sqldf) merged <- sqldf('SELECT l.ID, l.cond, l.block, l.correct, case when l.correct == 0 then r.mean else l.msec end as msec FROM df1 as l LEFT JOIN df2 as r ON l.ID = r.ID AND l.cond = r.cond AND l.block = r.block') merged ID cond block correct msec 1 rs 1 2 1 456 2 rs 1 2 0 545 3 rs 2 4 1 756 4 tr 1 2 1 654 5 tr 1 2 1 625 6 tr 2 4 0 765 
+2
source

With dplyr . This is the left_join solution of all columns and mutate when the correct value is 0.

 library(dplyr) left_join(df1,df2)%>% mutate(msec=ifelse(correct==0,mean,msec))%>% select(-mean) ID cond block correct msec 1 rs 1 2 1 456 2 rs 1 2 0 545 3 rs 2 4 1 756 4 tr 1 2 1 654 5 tr 1 2 1 625 6 tr 2 4 0 765 
+1
source

Source: https://habr.com/ru/post/1267024/


All Articles