ColMeans in R and run into problems with columns of size 1

I have a question about the colMeans function. Is there a version of this that will not return an error when launched into a column of length one? for instance

temp<-cbind(c(2,2),c(3,4)) colMeans(temp) [1] 2.0 3.5 

But for this

 temp2<-c(2,2) colMeans(temp2) Error in colMeans(temp2) : 'x' must be an array of at least two dimensions 

But, if I apply the average value of the function to each column, it correctly matches the values โ€‹โ€‹of 2 and 2.

I wrote a function for this

 testfun<-function(i,x){ mean(x[,i]) } sapply(1:ncol(x),testfun,x) 

which gives the same results as colMeans.
I heard that colMeans should be much faster than this method. So, is there a version of colMeans that will work when my column is 1.

+6
source share
3 answers

As @Paul points out, colMeans expects an "array of two or more dimensions" for its argument x (from ?colMeans ). But temp2 not an array

 is.array(temp2) # [1] FALSE 

temp2 can be converted to an array:

 (tempArray <- array(temp2, dim = c(1, 2))) # [,1] [,2] # [1,] 2 2 colMeans(tempArray) # [1] 2 2 

Perhaps temp2 came from a subset of the array, e.g.

 array(temp2, dim = c(2, 2))[1, ] 

But this is not an array. To save it as an array, add drop = FALSE to the brackets:

 array(temp2, dim = c(2, 2))[1, , drop = FALSE] # [,1] [,2] # [1,] 2 2 

Then you can use colMeans in a subset.

+7
source

The colMeans function colMeans intended for n-dimensional arrays. When your column is 1 (1 colony or 1 row?), You actually have a vector. On a vector, using only mean is excellent. In terms of speed, calculating the average of millions is very fast:

 > system.time(mean(runif(10e5))) user system elapsed 0.038 0.000 0.038 
+4
source

@PaulHiemstra and @BenBarnes provide the correct answers. I just want to add to their explanations.

Vectors versus Arrays

Vectors represent the fundamental data structure in R. Almost everything is internally represented as vector, even lists (with the exception of a special kind of list, list of dotted pairs, see ?list ). Arrays are simply vectors with an attached attribute, a dim attribute that describes the size of an object. Consider the following:

 v <- c(1:10) a <- array(v, dim = c(5, 2)) length(v) # 10 length(a) # 10 attributes(v) # NULL attributes(a) # $dim 10 1 is.vector(v) # TRUE is.array(v) # FALSE is.vector(a) # FALSE is.array(a) # TRUE 

Both v and a are length 10 . The only difference: a has a dim attribute attached to it. Because of this added attribute, R treats a externally as an array instead of a vector. Changing only the dim attribute can change the R appearance of the object from the array to the vector and vice versa:

 attr(a, "dim") <- NULL is.vector(a) # TRUE is.array(a) # FALSE attr(v, "dim") <- c(5, 2) is.vector(v) # FALSE is.array(v) # TRUE 

In your example, temp2 is a vector object that does not have the dim attribute. colMeans expects an array object with a dim attribute of length at least 2 (two-dimensional). You can easily convert temp2 to a two-dimensional array with one column:

 temp3 <- array(temp2, dim = c(length(temp2), 1)) # or: temp4 <- temp2 attr(temp4, "dim") <- c(length(temp2), 1) is.array(temp2) # FALSE is.array(temp3) # TRUE is.array(temp4) # TRUE 

colMeans () vs medium ()

@PaulHiemstra is right, instead of converting a vector into a single column for colMeans() , only mean() for a vector is much more often used. However, you are correct that colMeans() is faster. I believe that this is due to the fact that he checks the correct data a little less, but we will need to make sure of the internal code code. Consider this example:

 # Create vector "v" and array "a" n <- 10e7 set.seed(123) # Set random number seed to ensure "v" and "a[,1]" are equal v <- runif(n) set.seed(123) # Set random number seed to ensure "v" and "a[,1]" are equal a <- array(runif(n), dim=c(n, 1)) # Test that "v" and "a[,1]" are equal all.equal(v, a[,1]) # TRUE # Functions to compare f1 <- function(x = v){mean(x)} # Using mean on vector f2 <- function(x = a){mean(x)} # Using mean on array f3 <- function(x = a){colMeans(x)} # Using colMeans on array # Compare elapsed time system.time(f1()) # elapsed time = 0.344 system.time(f2()) # elapsed time = 0.366 system.time(f3()) # elapsed time = 0.166 

colMeans() in an array is faster than mean() on a vector or array. However, most of the time this acceleration will be negligible. I find it more natural to use mean() for a vector or single column array. But, if you are a true speed demoner, you can sleep better at night knowing that you save several hundred milliseconds of processing time using colMeans() for single columns.

+2
source

Source: https://habr.com/ru/post/915889/


All Articles