Error in boot () related to replacement length and data or data types? - R

boot () fails with one data set and succeeds with another ... should there be a data problem? I just can't understand the difference. But at least right now I think it’s reproducible in me. In both cases, the interaction between the integer and factor variable is regressed (lm) to a numerical dependent variable. boot () does not work with error:

Error in boot(data = data, statistic = bs_p, R = 1000) : number of items to replace is not a multiple of replacement length 

My aggregate function to return p values:

  bs_p <- function (data, i) { d <- data[i,] fit <- lm (y~x*fac, data=d) return(summary(fit)$coefficients[,4]) } 

When I generate random data to play and post a question here, for example:

  L3 <- LETTERS[1:3] data <- data.frame(x=1:50, y=rnorm(1:50), fac=as.factor(sample(L3, 50, replace = TRUE))) 

and then bootstrap:

  results <- boot(data=data, statistic=bs_p, R=1000) 

boot work; There is no error; statistics. But with my own data (below) from the same types, boot () returns an error.

  y <- c(17.820, 13.764, 18.880, 25.830, 26.576, 29.832, 22.610, 24.180, 26.572, 26.030, 29.200, 28.560, 28.600, 16.614, 16.302, 18.080, 22.704, 28.101, 38.280, 17.100, 19.292, 33.165, 18.395, 19.434, 27.544, 17.010, 21.560, 28.120, 17.513, 21.646,24.060, 27.984, 20.830, 21.588, 26.280, 29.640, 17.313, 16.344, 16.362, 34.496, 22.785, 20.203, 29.040, 19.092, 20.890,20.739, 17.700, 17.424, 28.737, 18.318, 39.470, 28.072, 17.176, 28.098) x <- as.integer(c(9, 5, 0, 8, 3, 4, 9, 6, 9, 2, 15, 10, 5, 1, 11, 11, 4, 8, 13, 1, 2, 4, 7, 7, 12, 1, 6, 6, 4, 3, 5, 5, 7, 9, 8, 3, 3, 14, 6, 4, 3, 6, 17, 3, 6, 6, 7, 1, 6, 10 , 2, 14 , 5, 8)) fac <- as.factor(c("F", "F", "F", "F", "F", "Ds", "F", "Ds","F","F","F","E", "Ds","F", "F", "E", "Ds","F", "Ds", "F", "Ds","E", "F", "E", "F", "Ds", "E", "Ds","F", "F", "F", "Ds","Ds", "F", "Ds","F", "F", "E", "F","F","F", "F", "F", "Ds","F", "F", "F", "F", "Ds", "E", "F", "F", "F", "E")) data <- data.frame(x=x, y=y, fac=fac) 

The linear model does an excellent job of this data on its own. traceback () gives nothing but a boot call. Please, any thoughts are most welcome. I'm on R 3.0.1 on MAC OSX. And thanks!

+4
source share
1 answer

Some (or at least one) boot copies do not contain all levels of factors, which leads to fewer coefficients (and corresponding p values), which leads to an error when combining bootstrap results. I think you need a stratified bootstrap or bootstrap leftovers (assuming p-values ​​are reasonable, which I doubt).

+3
source

Source: https://habr.com/ru/post/1502302/


All Articles