Using the splitting function in R

I am trying to simulate three small datasets that contain x1 , x2 , x3 , x4 , trt and IND .

However, when I try to split simulated data using IND using "split" in R, I get warning messages and exits. Can someone please give me a hint what I did wrong in my R code?

 # Step 2: simulate data Alpha = 0.05 S = 3 # number of replicates x = 8 # number of covariates G = 3 # number of treatment groups N = 50 # number of subjects per dataset tot = S*N # total subjects for a simulation run # True parameters alpha = c(0.5, 0.8) # intercepts b1 = c(0.1,0.2,0.3,0.4) # for pi_1 of trt A b2 = c(0.15,0.25,0.35,0.45) # for pi_2 of trt B b = c(1.1,1.2,1.3,1.4); ############################################################################## # Scenario 1: all covariates are independent standard normally distributed # ############################################################################## set.seed(12) x1 = rnorm(n=tot, mean=0, sd=1);x2 = rnorm(n=tot, mean=0, sd=1); x3 = rnorm(n=tot, mean=0, sd=1);x4 = rnorm(n=tot, mean=0, sd=1); ############################################################################### p1 = exp(alpha[1]+b1[1]*x1+b1[2]*x2+b1[3]*x3+b1[4]*x4)/ (1+exp(alpha[1]+b1[1]*x1+b1[2]*x2+b1[3]*x3+b1[4]*x4) + exp(alpha[2]+b2[1]*x1+b2[2]*x2+b2[3]*x3+b2[4]*x4)) p2 = exp(alpha[2]+b2[1]*x1+b2[2]*x2+b2[3]*x3+b2[4]*x4)/ (1+exp(alpha[1]+b1[1]*x1+b1[2]*x2+b1[3]*x3+b1[4]*x4) + exp(alpha[2]+b2[1]*x1+b2[2]*x2+b2[3]*x3+b2[4]*x4)) p3 = 1/(1+exp(alpha[1]+b1[1]*x1+b1[2]*x2+b1[3]*x3+b1[4]*x4) + exp(alpha[2]+b2[1]*x1+b2[2]*x2+b2[3]*x3+b2[4]*x4)) # To assign subjects to one of treatment groups based on response probabilities tmp = function(x){sample(c("A","B","C"), 1, prob=x, replace=TRUE)} trt = apply(cbind(p1,p2,p3),1,tmp) IND=rep(1:S,each=N) #create an indicator for split simulated data sim=data.frame(x1,x2,x3,x4,trt, IND) Aset = subset(sim, trt=="A") Bset = subset(sim, trt=="B") Cset = subset(sim, trt=="C") Anew = split(Aset, f = IND) Bnew = split(Bset, f = IND) Cnew = split(Cset, f = IND) 

Warning message:

 > Anew = split(Aset, f = IND) Warning message: In split.default(x = seq_len(nrow(x)), f = f, drop = drop, ...) : data length is not a multiple of split variable 

and the output will be

 $`2` x1 x2 x3 x4 trt IND 141 1.0894068 0.09765185 -0.46702047 0.4049424 A 3 145 -1.2953113 -1.94291045 0.09926239 -0.5338715 A 3 148 0.0274979 0.72971804 0.47194731 -0.1963896 A 3 $`3` [1] x1 x2 x3 x4 trt IND <0 rows> (or 0-length row.names) 

I checked my R-code several times, but I can’t understand what I did wrong. Thank you very much in advance

+4
source share
3 answers

IND is a global variable for full sim data. Do you want to use specific for a subset like

 Anew <- split(Aset, f = Aset$IND) 
+5
source

This is a warning, not an error, indicating split successfully executed, but might not have done what you wanted to do. In the "Details" section of the help file:

f is recycled as necessary and if the length of x is not a multiple of the length of fa warning is printed. Any missing values in f are dropped together with the corresponding values of x.

Try checking the length of your IND against the size of your data frame, possibly.

+4
source

Not sure what your goal is when you share your data, but it seems like a good candidate for the plyr package.

 > library(plyr) > ddply(sim, .(trt,IND), summarise, x1mean=mean(x1), x2sum=sum(x2), x3min=min(x3), x4max=max(x4)) trt IND x1mean x2sum x3min x4max 1 A 1 -0.49356448 -1.5650528 -1.016615 2.0027822 2 A 2 0.05908053 5.1680463 -1.514854 0.8184445 3 A 3 0.22898716 1.8584443 -1.934188 1.6326763 4 B 1 0.01531230 1.1005720 -2.002830 2.6674931 5 B 2 0.17875088 0.2526760 -1.546043 1.2021935 6 B 3 0.13398967 -4.8739380 -1.565945 1.7887837 7 C 1 -0.16993037 -0.5445507 -1.954848 0.6222546 8 C 2 -0.04581149 -6.3230167 -1.491114 0.8714535 9 C 3 -0.41610973 0.9085831 -1.797661 2.1174894 > 

Where you can substitute summarise and its following arguments for any function that returns data.frame or something that can be forced to it. If listings are the goal, ldply is your friend.

+1
source

Source: https://habr.com/ru/post/1394215/


All Articles