I had this weird issue with apply recently. Consider the following example:
set.seed(42) df <- data.frame(cars, foo = sample(LETTERS[1:5], size = nrow(cars), replace = TRUE)) head(df) speed dist foo 1 4 2 E 2 4 10 E 3 7 4 B 4 7 22 E 5 8 16 D 6 9 10 C
I want to use apply to apply the fun (say mean ) function for each column of this data.frame . If data.frame contains only numeric values, I have no problem:
apply(cars, 2, mean) speed dist 15.40 42.98
But when I try with my data.frame data containing numeric and character , it seems to fail:
apply(df, 2, mean) speed dist foo NA NA NA Warning messages: 1: In mean.default(newX[, i], ...) : argument is not numeric or logical: returning NA 2: In mean.default(newX[, i], ..) : argument is not numeric or logical: returning NA 3: In mean.default(newX[, i], ...) : argument is not numeric or logical: returning NA
Of course, I expected to get NA for the character column, but I would like to get values ββfor the numeric columns.
sapply(df, class) speed dist foo "numeric" "numeric" "factor"
Any pointers would be appreciated as I feel like I'm missing something very obvious here!
> sessionInfo() R version 2.14.1 (2011-12-22) Platform: x86_64-unknown-linux-gnu (64-bit) locale: [1] LC_CTYPE=en_GB.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_GB.UTF-8 LC_COLLATE=en_GB.UTF-8 [5] LC_MONETARY=en_GB.UTF-8 LC_MESSAGES=en_GB.UTF-8 [7] LC_PAPER=C LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods base