Using the glmulti package in R to exhaustively search for multiple regression for akaike weights

Question

Using the glmulti package in R to exhaustively search for multiple regression for akaike weights

I was wondering if anyone could help me understand why I am getting an error when I enter a script in R. For abit background information, I study the effect of 6 different variables (which I think is 63 combinations or models) (X ) provide gross primary and clean ecosystem production (Y) separately at different spatial scales for my environmental project. I decided to use an exhaustive search in several regression analyzes using the akaikes criterion (AIC) to try to find a group of models for the best fit. (and hierarchical separation for comparing variance attributed to different X-variables). I want to get weights so that I can rank which models “best fit” the criterion, see if there is one or a group of them that equips the others and, therefore, will be more likely to fit the data.

I recently posted a similar question about the hier.part package on Cross Validated, which got a great answer, and I was told to come here if I had any similar questions in the future.

The package I use for R is glmulti. which can be found here

script i use this

require(glmulti) GPPANDDRIVER<-read.table("C:\\Databases at different scales for R\\River Rhine and Netherlands\\GPP and drivers rhineland (comma delimited).csv",header=T,sep=",") GPP<-GPPANDDRIVER$GPP IND_VARS<-subset(GPPANDDRIVER,select=-GPP) # glmulti S4 generic glmulti(y=GPP, xr=IND_VARS, data, exclude = c(), name = "glmulti.analysis", intercept = TRUE, marginality = FALSE, bunch=30, chunk = 1, chunks = 1, level = 2, minsize = 0, maxsize = -1, minK = 0, maxK = -1, method = "h", crit = "aic", confsetsize = 63, popsize = 40, mutrate = 10^-3, sexrate = 0.1, imm = 0.3, plotty = TRUE, report = TRUE, deltaM = 0.05, deltaB = 0.05, conseq = 5, fitfunction = "glm", resumefile = "id", includeobjects=TRUE,)

Here is a link for the .csv data for the rinland sites mentioned in the example, http://www.filedropper.com/gppanddriversrhinelandandmadelimited

I am extremely new to R, so I suggested that popsize means the number of replicas, which for this scale is 40, so I used 40, I also suggested that confsetsize means the number of possible models, which I think 63 due to 6 variables ?

If anyone can help, we will be very grateful

Thank you for your patience and apologies for the main question.

Richard

edit I just tried running the script this morning, and now it crashes R.

+6

r regression

Neil Apr 16 '12 at 22:50

source share

1 answer

Ben bolker · Accepted Answer · 2012-04-17T14:30:30+0000

It worked for me. I think the main thing is not to blindly include all the parameters in the model call. Most of them have default values, so (if the package writer has done his job) you should leave them as they are and not worry too much (although, of course, you should RTFM and (try) to understand what they mean .. .)

 dat <- read.csv("GPPdriversRhineland.csv") library(glmulti)

I decided to rename the predictor with shorter tags:

 prednames <- c("NDVI","solar.rad","avg.temp","precip", "nutr.avail","water.cap") names(dat)[1:6] <- prednames

This is all you need to match all the combinations of the main effects: since you have six predictors, there are 64 level 1 models (including the null model).

 g1 <- glmulti("GPP",xr=prednames,data=dat,level=1)

For a larger computational task:

 g2 <- glmulti("GPP",xr=prednames,data=dat,level=2)

I believe that there is 2^(choose(6,2)+6) = 2.1 million possible models. I did not look at ?glmulti close enough to say how to stop the installation of models. I just started (so far he has evaluated 66,000 models), but he found a two-level model with an AIC of about 500.5, which is much better than min-AIC 518 in a set of 1-level models ..

PS I played with the settings a little more, trying to use a genetic algorithm, rather than an exhaustive approach (I see no obvious way to tell glmulti "use an exhaustive approach, but stop after N attempts"). Even with slightly more permissive than the default settings for the genetic algorithm, it seems to be stuck on the AIC by about 504, higher than the value found in the (partial) comprehensive screening I tried first.

eg:.

 g2 <- glmulti("GPP",xr=prednames,data=dat,level=2,marginality=TRUE, method="g",conseq=25,popsize=500,mutrate=1e-2)

PPS : the reason I got the best results in the exhaustive case was because I had marginality=FALSE , i.e. the model was allowed to leave the main effect parameters that were involved in the interactions included in the model. This is not necessarily reasonable. If I turn off the marginality restriction, then the genetic algorithm can go down to AIC = 499 without any problems ...

 glmulti("GPP",xr=prednames,data=dat,level=2,marginality=TRUE, method="d")

also useful: it prints the number of candidate models defined for this specification.

Using the glmulti package in R to exhaustively search for multiple regression for akaike weights

More articles: