I had a problem using a function predict()for a model mgcv::gam(training) in a new (test) dataset. The problem arises from the mrfsmooth one that I integrated to account for the spatial nature of my data.
I use the following call to create my GAM model
## Run GAM with MRF
m <- gam(crime ~ s(district,k=nrow(traindata),
bs ='mrf',xt=list(nb=nbtrain)), #define MRF smooth
data = traindata,
method = 'REML',
family = scat(), #fit scaled t distribution
gamma = 1.4
)
where I predict the dependent variable crimeusing the neighborhood structure, is analyzed in the model in an argument with smooth terms xt. The neighborhood structure comes as an object nbthat I created using a function poly2nb().
Now, if I want to use predict()testing in a new dataset, I don’t know how to pass the appropriate neighborhood structure to the call. Providing only new data
pred <- predict.gam(m,newdata=testdata)
produces the following error:
Error in predict.gam(m, newdata = testdata) :
7, 16, 20, 28, 35, 36, 37, 43 not in original fit
Here's a complete error reproduction using a Columbus dataset directly called from inside R:
require(mgcv)
require(spdep)
require(dplyr)
data(columb.polys)
columb.polys <- lapply(columb.polys,na.omit)
data(columb)
df <- data.frame(district=numeric(0),x=numeric(0),y= numeric(0))
for (i in 1:length(columb.polys)) {
district <- i-1
x <- columb.polys[[i]][,1]
y <- columb.polys[[i]][,2]
df <- rbind(df,cbind(district,x,y))
}
sp <- df %>%
group_by(district) %>%
do(poly=select(., x, y) %>%Polygon()) %>%
rowwise() %>%
do(polys=Polygons(list(.$poly),.$district)) %>%
{SpatialPolygons(.$polys)}
spdf <- SpatialPolygonsDataFrame(sp,columb)
splt <- sample(1:2,size=nrow(spdf),replace=TRUE,prob=c(0.8,0.2))
train <- spdf[splt==1,]
test <- spdf[splt==2,]
traindata <- train@data
testdata <- test@data
traindata <- droplevels(as(train, 'data.frame'))
testdata <- droplevels(as(test, 'data.frame'))
traindata$district <- as.factor(traindata$district)
testdata$district <- as.factor(testdata$district)
nbtrain <- poly2nb(train, row.names=train$Precinct, queen=FALSE)
nbtest <- poly2nb(test, row.names=test$Precinct, queen=FALSE)
names(nbtrain) <- attr(nbtrain, "region.id")
names(nbtest) <- attr(nbtest, "region.id")
m <- gam(crime ~ s(district, k=nrow(traindata), bs = 'mrf',xt = list(nb = nbtrain)),
data = traindata,
method = 'REML',
family = scat(),
gamma = 1.4
)
pred <- predict.gam(m,newdata=testdata)