Suppose I have data.frame below, where treat == 1 means that the resulting processing id and prob is the calculated probability that treat == 1 .
set.seed(1) df <- data.frame(id = 1:10, treat = sample(0:1, 10, replace = T)) df$prob <- ifelse(df$treat, rnorm(10, .8, .1), rnorm(10, .4, .4)) df id treat prob 1 1 0 0.3820266 2 2 0 0.3935239 3 3 1 0.8738325 4 4 1 0.8575781 5 5 0 0.6375605 6 6 1 0.9511781 7 7 1 0.8389843 8 8 1 0.7378759 9 9 1 0.5785300 10 10 0 0.6479303
To minimize selection bias, now I want to create pseudo-processing and control groups based on the values โโof treat and prob :
If any id with treat == 1 is within 0.1 prob any id with treat == 0 , I want the group value to be processed.
If any id with treat == 0 is within 0.1 prob any id with treat == 1 , I want the group value to be "control".
Below is an example of what I would like to receive.
df$group <- c(NA, NA, NA, NA, 'control', NA, NA, 'treated', 'treated', 'control') df id treat prob group 1 1 0 0.3820266 <NA> 2 2 0 0.3935239 <NA> 3 3 1 0.8738325 <NA> 4 4 1 0.8575781 <NA> 5 5 0 0.6375605 control 6 6 1 0.9511781 <NA> 7 7 1 0.8389843 <NA> 8 8 1 0.7378759 treated 9 9 1 0.5785300 treated 10 10 0 0.6479303 control
How can I do it? In the above example, matching is done with replacement, but a solution without replacement is also welcome.
source share