Lm internally called dlply causes "0 (non-NA) cases" error [r]

Question

Lm internally called dlply causes "0 (non-NA) cases" error [r]

I use dlply () with a custom function that averages lm () slopes for data containing some NA values, and I get the error "Error in lm.fit (x, y, offset = offset, singular.ok = singular.ok, ...): 0 (non-NA) "

This error only occurs when dlply is called with two key variables - splitting into one variable works great.

It’s annoying that I can’t reproduce the error using a simple data set, so I posted the problem data set in my Dropbox.

Here, the code is minimized as little as possible when creating the error:

masterData <- read.csv("http://dl.dropbox.com/u/48901983/SOquestionData.csv", na.strings="#N/A") workingData <- data.frame(sample = masterData$sample, substrate = masterData$substrate, el1 = masterData$elapsedHr1, F1 = masterData$r1 - masterData$rK) #This function is trivial as written; in reality it takes the average of many slopes meanSlope <- function(df) { lm1 <- lm(df$F1 ~ df$el1, na.action=na.omit) #changing to na.exclude doesn't help slope1 <- lm1$coefficients[2] meanSlope <- mean(c(slope1)) } lsGOOD <- dlply(workingData, .(sample), meanSlope) #works fine lsBAD <- dlply(workingData, .(sample, substrate), meanSlope) #throws error

Thanks in advance for your understanding.

+6

r plyr lm

Drew steen Mar 01 '12 at 16:34

source share

2 answers

According to my comment:

 my.func <- function(df) { data.frame(el1=all(is.na(df$el1)), F1=all(is.na(df$F1))) } ddply(workingData, .(sample, substrate), my.func)

Shows that you have many subsets where F1 and el1 are NA. (in fact, every time everything is everything, another is different!)

+2

Justin Mar 01 '12 at 16:46

source share

42- · Accepted Answer · 2012-03-01T16:46:17+0000

For several of your cross-classifications you do not have covariates:

  with(masterData, table(sample, substrate, r1mis = is.na(r1) ) ) # snipped the nonmissing reports , , r1mis = TRUE substrate sample 1 2 3 4 5 6 7 8 3 0 0 0 0 0 0 0 0 4 0 0 0 0 0 0 0 0 5 0 0 0 0 0 0 0 0 6 0 0 0 0 0 0 0 0 7 0 0 0 0 0 0 3 3 8 0 0 0 0 0 0 0 3 9 0 0 0 0 0 0 0 3 10 0 0 0 0 0 0 0 3 11 0 0 0 0 0 0 0 3 12 0 0 0 0 0 0 0 3 13 0 0 0 0 0 0 0 3 14 0 0 0 0 0 0 0 3

This will allow you to skip subsets with insufficient data in this specific data:

 meanSlope <- function(df) { if ( sum(!is.na(df$el1)) < 2 ) { return(NA) } else { lm1 <- lm(df$F1 ~ df$el1, na.action=na.omit) #changing to na.exclude doesn't help slope1 <- lm1$coefficients[2] meanSlope <- mean(c(slope1)) } }

Although this depends on the absence in one particular covariate. A more robust solution would be to use try to catch errors and convert to NA.

 ?try

Lm internally called dlply causes "0 (non-NA) cases" error [r]

More articles: