How to perform group matching in R?

Suppose I have data.frame below, where treat == 1 means that the resulting processing id and prob is the calculated probability that treat == 1 .

 set.seed(1) df <- data.frame(id = 1:10, treat = sample(0:1, 10, replace = T)) df$prob <- ifelse(df$treat, rnorm(10, .8, .1), rnorm(10, .4, .4)) df id treat prob 1 1 0 0.3820266 2 2 0 0.3935239 3 3 1 0.8738325 4 4 1 0.8575781 5 5 0 0.6375605 6 6 1 0.9511781 7 7 1 0.8389843 8 8 1 0.7378759 9 9 1 0.5785300 10 10 0 0.6479303 

To minimize selection bias, now I want to create pseudo-processing and control groups based on the values โ€‹โ€‹of treat and prob :

  • If any id with treat == 1 is within 0.1 prob any id with treat == 0 , I want the group value to be processed.

  • If any id with treat == 0 is within 0.1 prob any id with treat == 1 , I want the group value to be "control".

Below is an example of what I would like to receive.

 df$group <- c(NA, NA, NA, NA, 'control', NA, NA, 'treated', 'treated', 'control') df id treat prob group 1 1 0 0.3820266 <NA> 2 2 0 0.3935239 <NA> 3 3 1 0.8738325 <NA> 4 4 1 0.8575781 <NA> 5 5 0 0.6375605 control 6 6 1 0.9511781 <NA> 7 7 1 0.8389843 <NA> 8 8 1 0.7378759 treated 9 9 1 0.5785300 treated 10 10 0 0.6479303 control 

How can I do it? In the above example, matching is done with replacement, but a solution without replacement is also welcome.

+5
source share
4 answers

I think this problem is well suited for cut in base R. Here's how you can do it in vector form:

 f <- function(r) { x <- cut(df[r,]$prob, breaks = c(df[!r,]$prob-0.1, df[!r,]$prob+0.1)) df[r,][!is.na(x),]$id } ones <- df$treat==1 df$group <- NA df[df$id %in% f(ones),]$group <- "treated" df[df$id %in% f(!ones),]$group <- "control" > df # id treat prob group # 1 1 0 0.3820266 <NA> # 2 2 0 0.3935239 <NA> # 3 3 1 0.8738325 <NA> # 4 4 1 0.8575781 <NA> # 5 5 0 0.6375605 control # 6 6 1 0.9511781 <NA> # 7 7 1 0.8389843 <NA> # 8 8 1 0.7378759 treated # 9 9 1 0.5785300 treated # 10 10 0 0.6479303 control 
+2
source

You can try

 foo <- function(x){ TR <- range(x$prob[x$treat == 0]) CT <- range(x$prob[x$treat == 1]) tmp <- sapply(1:nrow(x), function(y, z){ if(z$treat[y] == 1){ ifelse(any(abs(z$prob[y] - TR) <= 0.1), "treated", "NA") }else{ ifelse(any(abs(z$prob[y] - CT) <= 0.1), "control", "NA") }}, x) cbind(x, group = tmp) } foo(df) id treat prob group 1 1 0 0.3820266 NA 2 2 0 0.3935239 NA 3 3 1 0.8738325 NA 4 4 1 0.8575781 NA 5 5 0 0.6375605 control 6 6 1 0.9511781 NA 7 7 1 0.8389843 NA 8 8 1 0.7378759 treated 9 9 1 0.5785300 treated 10 10 0 0.6479303 control 
+4
source

Perhaps not the most elegant, but it seems to work for me:

 df %>% group_by(id,treat) %>% mutate(group2 = ifelse(treat==1, ifelse(any(abs(prob-df[df$treat==0,3])<0.1),"treated","NA"), ifelse(any(abs(prob-df[df$treat==1,3])<0.1),"control","NA"))) # treat==0 
+1
source

Is this what you want?

 #Base R: apply(df[df$treat == 1, ],1, function(x){ ifelse(any(df[df$treat == 0, 'prob'] -.1 < x[3] & x[3] < df[df$treat == 0, 'prob'] +.1), 'treated', NA) }) 

You can invert the $treat clause to display a control group and bind variables to your df.

+1
source

Source: https://habr.com/ru/post/1266969/


All Articles