Predict.glm does not predict missing values ​​in response

For some reason, when I specify glms (and lm is also obtained), R does not predict missing data values. Here is an example:

y = round(runif(50)) y = c(y,rep(NA,50)) x = rnorm(100) m = glm(y~x, family=binomial(link="logit")) p = predict(m,na.action=na.pass) length(p) y = round(runif(50)) y = c(y,rep(NA,50)) x = rnorm(100) m = lm(y~x) p = predict(m) length(p) 

The length of p should be 100, but its 50. It is strange that I have other forecasts in the same script that predict the lack of data.

EDIT: It turns out these other predictions were completely wrong - I did imputed.value = rnorm(N,mean.from.predict,var.of.prediction.interval) . This recycle the middle and sd vectors from lm ​​predict or glm predict functions when length(predict)<N , which is very different from what I was looking for.

So my question is: how does my code stop glm and lm from predicting missing values?

Thanks!

+6
source share
3 answers

When glm matches the model, it only uses cases where there are no missing values. You can still get predictions for cases where your y values ​​are missing by creating a data frame and passing it to predict.glm .

 predict(m, newdata=data.frame(y, x)) 
+8
source

The problem is with your glm call, which has the argument na.action , which is set to na.omit

Therefore, these values ​​are omitted (and when predict.glm is predict.glm , they are still omitted)

From ?glm

na.action

a function that indicates what should happen when the data contains NS. The default value is set by the parameter na.action of the parameters, and is na.fail if not specified. The default setting is factory -fresh na.omit. Another possible value is NULL, no action. The value na.exclude may be useful.

from ?na.exclude (which is a common NA man page)

na.exclude differs from na.omit only in the class "na.action" an attribute of the result, which "excludes". This gives different behaviors in functions using naresid and napredict: when na.exclude is used, balances and forecasts are complemented by the correct length by inserting NA for cases omitted by na.exclude.

+4
source

I don’t know where you got the idea that the R regression functions should automatically assign missing values. This is just a misreading of the glm man page. If you have predictions about things that you “think” are missing from the data that you did not provide, I assume that they are actually missing, but maybe levels labeled “NA”. This is not a missing value in R. Show us str (chr.imp) for the dataset you are working with. The “Imp” part of this name makes me think that you (or someone in front of you) built some conventions.

If you want to enter data, first you need to read information about hte-problems, and then select a package to do this. To find such packages, try the following:

 install.packages("sos") require(sos) findFn("impute") #--------- found 834 matches; retrieving 20 pages, 400 matches. 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 Downloaded 383 links in 118 packages. 
0
source

Source: https://habr.com/ru/post/943829/


All Articles