Caterpillar plot of only “significant” random effects from the mixed effects model

I was very impressed asking for help here earlier, and I hope to get help again.

I rate a rather large mixed effect model in which one of the random effects has over 150 different levels. This would make the standard crawler plot quite unreadable.

I would like, if at all possible, to get a caterpillar plot of only levels of random effect, which are the result of the lack of a better term, "significant". That is: I need a caterpillar plot in which either a random interception or a random slope for a changing coefficient has a “confidence interval” (I know that it’s not quite what it is) that does not include zero.

Consider this standard model from sleepstudy data, which is standard with lme4 .

 library(lme4) fit <- lmer(Reaction ~ Days + (Days|Subject), sleepstudy) ggCaterpillar(ranef(fit,condVar=TRUE), QQ=FALSE, likeDotplot=TRUE, reorder=FALSE)[["Subject"]] 

I would get this tracked plot.

a caterpillar plot

The track section that I use comes from this code . Note that I tend to use less conservative estimates for intervals (i.e. 1.645 * se, not 1.96 * se).

Basically, I need a caterpillar plot that will include levels 308, 309, 310, 330, 331, 335, 337, 349, 350, 352 and 370, because these levels either intercept or whose intervals did not include zero. I ask because my crawler plot from more than 150 different levels is unreadable, and I think this could be a useful solution.

The following is reproducible code. I sincerely appreciate any help.

 # /questions/1238577/how-can-i-sort-random-effects-by-value-of-the-random-effect-not-the-intercept-in-dotplot-or-ggplot2 ggCaterpillar <- function(re, QQ=TRUE, likeDotplot=TRUE, reorder=TRUE) { require(ggplot2) f <- function(x) { pv <- attr(x, "postVar") cols <- 1:(dim(pv)[1]) se <- unlist(lapply(cols, function(i) sqrt(pv[i, i, ]))) if (reorder) { ord <- unlist(lapply(x, order)) + rep((0:(ncol(x) - 1)) * nrow(x), each=nrow(x)) pDf <- data.frame(y=unlist(x)[ord], ci=1.645*se[ord], nQQ=rep(qnorm(ppoints(nrow(x))), ncol(x)), ID=factor(rep(rownames(x), ncol(x))[ord], levels=rownames(x)[ord]), ind=gl(ncol(x), nrow(x), labels=names(x))) } else { pDf <- data.frame(y=unlist(x), ci=1.645*se, nQQ=rep(qnorm(ppoints(nrow(x))), ncol(x)), ID=factor(rep(rownames(x), ncol(x)), levels=rownames(x)), ind=gl(ncol(x), nrow(x), labels=names(x))) } if(QQ) { ## normal QQ-plot p <- ggplot(pDf, aes(nQQ, y)) p <- p + facet_wrap(~ ind, scales="free") p <- p + xlab("Standard normal quantiles") + ylab("Random effect quantiles") } else { ## caterpillar dotplot p <- ggplot(pDf, aes(ID, y)) + coord_flip() if(likeDotplot) { ## imitate dotplot() -> same scales for random effects p <- p + facet_wrap(~ ind) } else { ## different scales for random effects p <- p + facet_grid(ind ~ ., scales="free_y") } p <- p + xlab("Levels of the Random Effect") + ylab("Random Effect") } p <- p + theme(legend.position="none") p <- p + geom_hline(yintercept=0) p <- p + geom_errorbar(aes(ymin=y-ci, ymax=y+ci), width=0, colour="black") p <- p + geom_point(aes(size=1.2), colour="blue") return(p) } lapply(re, f) } library(lme4) fit <- lmer(Reaction ~ Days + (Days|Subject), sleepstudy) ggCaterpillar(ranef(fit,condVar=TRUE), QQ=FALSE, likeDotplot=TRUE, reorder=FALSE)[["Subject"]] ggsave(file="sleepstudy.png") 
+5
source share
1 answer

Firstly, thanks for quoting the "significant" ones ... anyone reading this should remember that the value has no statistical value in this context (it would be better to use the Z statistics (value / std. Error ), for example, | Z |> 1.5 or | Z |> 1.75, just to emphasize that this is not a withdrawal threshold ...)

In the end, I got a little carried away ... I decided that it would be better to refactor / modulate things a bit, so I wrote the augment method (designed to work with the broom package) that creates useful data frames from ranef.mer objects ... as soon as this will be done, the required manipulations are quite simple.

I put the augment.ranef.mer code at the end of my answer - it is a bit long (you will need to specify it before you can run the code here).

 library(broom) library(reshape2) library(plyr) 

Apply the augment method to the RE object:

 rr <- ranef(fit,condVar=TRUE) aa <- augment(rr) names(aa) ## [1] "grp" "variable" "level" "estimate" "qq" "std.error" ## [7] "p" "lb" "ub" 

Now the ggplot code ggplot pretty simple. I use geom_errorbarh(height=0) rather than geom_pointrange()+coord_flip() because ggplot2 cannot use coord_flip with facet_wrap(...,scales="free") ...

 ## QQ plot: g0 <- ggplot(aa,aes(estimate,qq,xmin=lb,xmax=ub))+ geom_errorbarh(height=0)+ geom_point()+facet_wrap(~variable,scale="free_x") ## regular caterpillar plot: g1 <- ggplot(aa,aes(estimate,level,xmin=lb,xmax=ub))+ geom_errorbarh(height=0)+ geom_vline(xintercept=0,lty=2)+ geom_point()+facet_wrap(~variable,scale="free_x") 

Now find the levels you want to keep:

 aa2 <- ddply(aa,c("grp","level"), transform, keep=any(p<0.05)) aa3 <- subset(aa2,keep) 

Refresh the track pattern only with levels of “significant” slopes or intercepts:

 g1 %+% aa3 

If you want to highlight only "significant" levels, and not completely remove the "non-essential" levels

 ggplot(aa2,aes(estimate,level,xmin=lb,xmax=ub,colour=factor(keep)))+ geom_errorbarh(height=0)+ geom_vline(xintercept=0,lty=2)+ geom_point()+facet_wrap(~variable,scale="free_x")+ scale_colour_manual(values=c("black","red"),guide=FALSE) 

 ##' @importFrom reshape2 melt ##' @importFrom plyr ldply name_rows augment.ranef.mer <- function(x, ci.level=0.9, reorder=TRUE, order.var=1) { tmpf <- function(z) { if (is.character(order.var) && !order.var %in% names(z)) { order.var <- 1 warning("order.var not found, resetting to 1") } ## would use plyr::name_rows, but want levels first zz <- data.frame(level=rownames(z),z,check.names=FALSE) if (reorder) { ## if numeric order var, add 1 to account for level column ov <- if (is.numeric(order.var)) order.var+1 else order.var zz$level <- reorder(zz$level, zz[,order.var+1], FUN=identity) } ## QQ values, for each column separately qq <- c(apply(z,2,function(y) { qnorm(ppoints(nrow(z)))[order(order(y))] })) rownames(zz) <- NULL pv <- attr(z, "postVar") cols <- 1:(dim(pv)[1]) se <- unlist(lapply(cols, function(i) sqrt(pv[i, i, ]))) ## nb: depends on explicit column-major ordering of se/melt zzz <- cbind(melt(zz,id.vars="level",value.name="estimate"), qq=qq,std.error=se) ## reorder columns: subset(zzz,select=c(variable, level, estimate, qq, std.error)) } dd <- ldply(x,tmpf,.id="grp") ci.val <- -qnorm((1-ci.level)/2) transform(dd, p=2*pnorm(-abs(estimate/std.error)), ## 2-tailed p-val lb=estimate-ci.val*std.error, ub=estimate+ci.val*std.error) } 
+8
source

Source: https://habr.com/ru/post/1238574/


All Articles