Using experimental uncertainty when performing linear regression in R

I have an experimental dataset that I would like to put in a polynomial. The data contains an independent variable, a dependent variable and uncertainty in the measurement of the latter, for example

2000 0.2084272 0.002067834 2500 0.207078125 0.001037248 3000 0.2054202 0.001959138 3500 0.203488075 0.000328942 4000 0.2013152 0.000646088 4500 0.198933825 0.001375657 5000 0.196375 0.000908696 5500 0.193668575 0.00014721 6000 0.1908432 0.000526976 6500 0.187926325 0.001217318 7000 0.1849442 0.000556495 7500 0.181921875 0.000401883 8000 0.1788832 0.001446992 8500 0.175850825 0.001235017 9000 0.1728462 0.001676249 9500 0.169889575 0.001011735 10000 0.167 0.000326678 

(columns x, y, + -y).

I can do a polynomial match using the above, e.g.

 mydata = read.table("example.txt") model <- lm(V2~V1+I(V1^2)+I(V1^3)+I(V1^4), data = mydata) 

but it does not use uncertainty values. How to tell R that the third column of the data set is uncertainty and that it should be used in regression analysis?

+5
source share
1 answer

With a measurement error in the dependent variable that is not correlated with the independent variables, the estimated coefficients are unbiased, but the standard errors are too small. Here is the link I used (pages 1 and 2): http://people.stfx.ca/tleo/econ370term2lec4.pdf

I think you just need to configure standard errors from those computed by lm (). And this is what I tried to do in the code below. I am not an extras, so you may want to publish it in order to pass the test and ask for better intuition.

In the example below, I assumed that the "uncertainty" column represents the standard deviation (or standard error). For simplicity, I changed the model to simple: y ~ x.

 # initialize dataset df <- data.frame( x = c(2000,2500,3000,3500,4000,4500,5000,5500,6000,6500,7000,7500,8000,8500,9000,9500,10000), y = c(0.2084272,0.207078125,0.2054202,0.203488075,0.2013152,0.198933825, 0.196375,0.193668575, 0.1908432, 0.187926325, 0.1849442, 0.181921875, 0.1788832, 0.175850825, 0.1728462,0.169889575, 0.167), y.err = c(0.002067834, 0.001037248, 0.001959138, 0.000328942, 0.000646088, 0.001375657, 0.000908696, 0.00014721, 0.000526976, 0.001217318, 0.000556495, 0.000401883, 0.001446992, 0.001235017, 0.001676249, 0.001011735, 0.000326678) ) df # model regression model <- lm(y~x, data = df) summary(model) # get the variance of the measurement error # thanks to: http://schools-wikipedia.org/wp/v/Variance.htm # law of total variance pooled.var.y.err <- mean((df$y.err)^2) + var((df$y.err)^2) # get variance of beta from model # thanks to: http://stats.stackexchange.com/questions/44838/how-are-the-standard-errors-of-coefficients-calculated-in-a-regression X <- cbind(1, df$x) # (if you add more variables to the model you need to modify the following line) var.betaHat <- anova(model)[[3]][2] * solve(t(X) %*% X) # add betaHat variance to measurement error variance var.betaHat.adj <- var.betaHat + pooled.var.y.err # calculate adjusted standard errors sqrt(diag(var.betaHat.adj)) # compare to un-adjusted standard errors sqrt(diag(var.betaHat)) 
+1
source

Source: https://habr.com/ru/post/1238960/


All Articles