Effects of a multi-component logistic model in mlogit

Question

Effects of a multi-component logistic model in mlogit

I got some good help in formatting my data by correctly creating a multi-minimum logistic model with mlogit here ( Data formatting for mlogit )

However, I am now trying to analyze the effects of covariates in my model. I found the help file in mlogit.effects() not very informative. One of the problems is that the model appears to create many NA lines (see below, index(mod1) ).

Can someone explain why my data is produced by these NS?
Can someone help me get mlogit.effects to work with the data below?
I would consider porting the analysis to multinom() . However, I cannot figure out how to format the data according to the formula for using multinom() . My data is a series of ranking of seven different subjects (accessible, information, compromise, debate, social and response). I would only simulate everything that they chose as their first rank and ignore what they chose in other ranks? I can get this information.

Playable code below:

 #Loadpackages library(RCurl) library(mlogit) library(tidyr) library(dplyr) #URL where data is stored dat.url <- 'https://raw.githubusercontent.com/sjkiss/Survey/master/mlogit.out.csv' #Get data dat <- read.csv(dat.url) #Complete cases only as it seems mlogit cannot handle missing values or tied data which in this case you might get because of median imputation dat <- dat[complete.cases(dat),] #Change the choice index variable (X) to have no interruptions, as a result of removing some incomplete cases dat$X <- seq(1,nrow(dat),1) #Tidy data to get it into long format dat.out <- dat %>% gather(Open, Rank, -c(1,9:12)) %>% arrange(X, Open, Rank) #Create mlogit object mlogit.out <- mlogit.data(dat.out, shape='long',alt.var='Open',choice='Rank', ranked=TRUE,chid.var='X') #Fit Model mod1 <- mlogit(Rank~1|gender+age+economic+Job,data=mlogit.out)

Here is my attempt to create a data frame similar to the one shown in the help file. This does not work. I admit, although I know that I am applying the family pretty well, tapply muddy for me.

 with(mlogit.out, data.frame(economic=tapply(economic, index(mod1)$alt, mean)))

Comparison with:

 data("Fishing", package = "mlogit") Fish <- mlogit.data(Fishing, varying = c(2:9), shape = "wide", choice = "mode") m <- mlogit(mode ~ price | income | catch, data = Fish) # compute a data.frame containing the mean value of the covariates in # the sample data in the help file for effects z <- with(Fish, data.frame(price = tapply(price, index(m)$alt, mean), catch = tapply(catch, index(m)$alt, mean), income = mean(income))) # compute the marginal effects (the second one is an elasticity effects(m, covariate = "income", data = z)

+6

r mlogit

spindoctor Jun 16 '15 at 19:00

source share

2 answers

I will try option 3 and switch to multinom() . This code will model the log-coefficients of ranking the item as 1st, compared to the reference item (for example, "Debate" in the code below). If K = 7 elements, if we call the Item _K control, then we model

log [Pr (element _k - 1st) / Pr (element _K - 1st)] = α _k + x ^T β <sub> xub>

for k = 1, ..., K-1, where Item _k is one of the other (that is, without links) elements. The choice of a reference level will affect the coefficients and their interpretation, but this will not affect the predicted probabilities. (A single story for reference levels for categorical predictor variables.)

I also mentioned that I process the missing data a little differently than in the source code. Since my model only needs to know which element takes the first place, I just need to throw away records where this information is missing. (For example, in the initial record of dataset No. 43, “Information” takes 1st place, so we can use this record, although 3 other elements are NA.)

 # Get data dat.url <- 'https://raw.githubusercontent.com/sjkiss/Survey/master/mlogit.out.csv' dat <- read.csv(dat.url) # dataframe showing which item is ranked #1 ranks <- (dat[,2:8] == 1) # for each combination of predictor variable values, count # how many times each item was ranked #1 dat2 <- aggregate(ranks, by=dat[,9:12], sum, na.rm=TRUE) # remove cases that didn't rank anything as #1 (due to NAs in original data) dat3 <- dat2[rowSums(dat2[,5:11])>0,] # (optional) set the reference levels for the categorical predictors dat3$gender <- relevel(dat3$gender, ref="Female") dat3$Job <- relevel(dat3$Job, ref="Government backbencher") # response matrix in format needed for multinom() response <- as.matrix(dat3[,5:11]) # (optional) set the reference level for the response by changing # the column order ref <- "Debate" ref.index <- match(ref, colnames(response)) response <- response[,c(ref.index,(1:ncol(response))[-ref.index])] # fit model (note that age & economic are continuous, while gender & # Job are categorical) library(nnet) fit1 <- multinom(response ~ economic + gender + age + Job, data=dat3) # print some results summary(fit1) coef(fit1) cbind(dat3[,1:4], round(fitted(fit1),3)) # predicted probabilities

I did not do any diagnostics, so I am not saying that the model used here is well suited.

+2

Dagremu Jun 21 '15 at 2:25

source share

Jarh · Accepted Answer · 2016-06-23T21:43:05+0000

You work with ranked data, not just multidimensional selection data. The structure of ranked data in mlogit consists in the fact that the first set of records for a person is all options, then in the second - all parameters except the first ranked, etc. But the index assumes an equal number of options each time. So a bunch of NA. We just need to get rid of them.

 > with(mlogit.out, data.frame(economic=tapply(economic, index(mod1)$alt[complete.cases(index(mod1)$alt)], mean))) economic Accessible 5.13 Debate 4.97 Information 5.08 Officials 4.92 Responsive 5.09 Social 4.91 Trade.Offs 4.91

Effects of a multi-component logistic model in mlogit

More articles: