How to get loaded p values ​​and loaded t-values ​​and how does the boot () function work?

I would like to get the loaded t value and the loaded p-value lm. I have the following code (mostly copied from paper) that works.

# First of all you need the following packages install.packages("car") install.packages("MASS") install.packages("boot") library("car") library("MASS") library("boot") boot.function <- function(data, indices){ data <- data[indices,] mod <- lm(prestige ~ income + education, data=data) # the liear model # the first element of the following vector contains the t-value # and the second element is the p-value c(summary(mod)[["coefficients"]][2,3], summary(mod)[["coefficients"]][2,4]) } 

Now I am calculating a bootstrap model that gives me the following:

 duncan.boot <- boot(Duncan, boot.function, 1999) duncan.boot ORDINARY NONPARAMETRIC BOOTSTRAP Call: boot(data = Duncan, statistic = boot.function, R = 1999) Bootstrap Statistics : original bias std. error t1* 5.003310e+00 0.288746545 1.71684664 t2* 1.053184e-05 0.002701685 0.01642399 

I have two questions:

  • I understand that the boot value is the original plus the offset, which means that both the loaded values ​​(the loaded t-value and the loaded p-value) are larger than the original values. This, in turn, is impossible, because if the value of t rises (which means a larger value), the p-values ​​MUST be lower, right? Therefore, I think that I still do not understand the output of the boot function (here: duncan.boot ). How to calculate loaded values?

  • I do not understand how boot () works. If you look at duncan.boot <- boot(Duncan, boot.function, 1999) , you will see that I did not pass any arguments to the "boot.function" function. I suppose R sets data <- Duncan . But since I didn’t pass anything for the “indexes” arguments, I don’t understand how the next line works in the function “boot.function” data <- data[indices,]

I hope the questions make sense! ??

+4
source share
1 answer

The load function "expects" to receive a function that has two arguments: the first is data.frame, and the second is a vector of "indices" (possibly with repeating elements and probably not using all indices) for use in selecting strings and, probably with multiple duplicates or triples). Then these are samples with a replacement determined by the pattern of duplicates and three repetitions from the original data frame (several times defined by "R" with different "sets of choices"), passes the argument to the boot.function functions to their indices, and then collects the results of the number of functional applications R.

Regarding what the print method is told for bootable objects, look at this (after checking the returned object with str ()

 > duncan.boot$t0 [1] 5.003310e+00 1.053184e-05 > apply(duncan.boot$t, 2, mean) [1] 5.342895220 0.002607943 > apply(duncan.boot$t, 2, mean) - duncan.boot$t0 [1] 0.339585441 0.002597411 

It becomes more obvious that the value of T0 comes from the original data, and the offset is the difference between the average value of the boot () and the values ​​of T0 . I don’t think it makes sense to ask a question about why p-values ​​based on parametric considerations grow due to an increase in estimated t-statistics. When you do this, you really are in two separate areas of statistical thought. I would interpret the increase in p-values ​​as an effect of the sampling process that did not take into account the assumption of a normal distribution. It just says something about the distribution of the p-value sample (which is really just another sample statistic).

(Comment: The source code used during the development of R was Davison and Hinckley “Bootstrap Methods and Their Applications.” I do not claim that I support my answer above, but I decided to put it as a reference after Hagen Brenner asked about the sample with with the two indexes in the comments below: There are many unexpected aspects of self-tuning that come up after you go beyond a simple parametric estimation, and I would go to this link first if I were to deal with more complex sampling situations.)

+7
source

Source: https://habr.com/ru/post/1383122/


All Articles