Apply () indicates NA values ​​for each column.

I had this weird issue with apply recently. Consider the following example:

 set.seed(42) df <- data.frame(cars, foo = sample(LETTERS[1:5], size = nrow(cars), replace = TRUE)) head(df) speed dist foo 1 4 2 E 2 4 10 E 3 7 4 B 4 7 22 E 5 8 16 D 6 9 10 C 

I want to use apply to apply the fun (say mean ) function for each column of this data.frame . If data.frame contains only numeric values, I have no problem:

 apply(cars, 2, mean) speed dist 15.40 42.98 

But when I try with my data.frame data containing numeric and character , it seems to fail:

 apply(df, 2, mean) speed dist foo NA NA NA Warning messages: 1: In mean.default(newX[, i], ...) : argument is not numeric or logical: returning NA 2: In mean.default(newX[, i], ..) : argument is not numeric or logical: returning NA 3: In mean.default(newX[, i], ...) : argument is not numeric or logical: returning NA 

Of course, I expected to get NA for the character column, but I would like to get values ​​for the numeric columns.

 sapply(df, class) speed dist foo "numeric" "numeric" "factor" 

Any pointers would be appreciated as I feel like I'm missing something very obvious here!

 > sessionInfo() R version 2.14.1 (2011-12-22) Platform: x86_64-unknown-linux-gnu (64-bit) locale: [1] LC_CTYPE=en_GB.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_GB.UTF-8 LC_COLLATE=en_GB.UTF-8 [5] LC_MONETARY=en_GB.UTF-8 LC_MESSAGES=en_GB.UTF-8 [7] LC_PAPER=C LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods base 
+4
source share
3 answers

The first sentence of the description for ?apply says:

If X is not an array, but is an object of a class with nonempty dim (for example, a data frame), attempts are made to force it to the array through as.matrix, if it is two-dimensional (for example, a data frame) or through as.array.

Matrices can only be of one type in R. When a data frame is forced into a matrix, everything ends as a character, if there is even one column of characters.

I think I should describe an alternative for you, so go ahead. data frames are really just lists, so if you want to apply a function to each column, use lapply or sapply .

+10
source

apply works on the matrix, and the matrix must be of the same type. Thus, df transformed into a matrix, and since it contains a character, all columns become characters.

 > apply(df, 2, class) speed dist foo "character" "character" "character" 

To get what you want, check the colwise and numcolwise in plyr .

 > numcolwise(mean)(df) speed dist 1 15.4 42.98 
+3
source

You apply the function on the columns of data.frame. Since data.frame is a list, you can use lapply or sapply instead of apply :

 sapply(df, mean) speed dist foo 15.40 42.98 NA Warning message: In mean.default(X[[3L]], ...) : argument is not numeric or logical: returning NA 

And you can remove the warning message using an anonymous function that checks the class number before calculating the average:

 sapply(df, function(x)ifelse(is.numeric(x), mean(x), NA)) speed dist foo 15.40 42.98 NA 
+2
source

Source: https://habr.com/ru/post/1401477/


All Articles