@PaulHiemstra and @BenBarnes provide the correct answers. I just want to add to their explanations.
Vectors versus Arrays
Vectors represent the fundamental data structure in R. Almost everything is internally represented as vector, even lists (with the exception of a special kind of list, list of dotted pairs, see ?list
). Arrays are simply vectors with an attached attribute, a dim
attribute that describes the size of an object. Consider the following:
v <- c(1:10) a <- array(v, dim = c(5, 2)) length(v)
Both v
and a
are length 10
. The only difference: a
has a dim
attribute attached to it. Because of this added attribute, R treats a
externally as an array instead of a vector. Changing only the dim
attribute can change the R appearance of the object from the array to the vector and vice versa:
attr(a, "dim") <- NULL is.vector(a)
In your example, temp2
is a vector object that does not have the dim
attribute. colMeans
expects an array
object with a dim
attribute of length at least 2 (two-dimensional). You can easily convert temp2
to a two-dimensional array with one column:
temp3 <- array(temp2, dim = c(length(temp2), 1)) # or: temp4 <- temp2 attr(temp4, "dim") <- c(length(temp2), 1) is.array(temp2) # FALSE is.array(temp3) # TRUE is.array(temp4) # TRUE
colMeans () vs medium ()
@PaulHiemstra is right, instead of converting a vector into a single column for colMeans()
, only mean()
for a vector is much more often used. However, you are correct that colMeans()
is faster. I believe that this is due to the fact that he checks the correct data a little less, but we will need to make sure of the internal code code. Consider this example:
# Create vector "v" and array "a" n <- 10e7 set.seed(123) # Set random number seed to ensure "v" and "a[,1]" are equal v <- runif(n) set.seed(123) # Set random number seed to ensure "v" and "a[,1]" are equal a <- array(runif(n), dim=c(n, 1)) # Test that "v" and "a[,1]" are equal all.equal(v, a[,1]) # TRUE # Functions to compare f1 <- function(x = v){mean(x)} # Using mean on vector f2 <- function(x = a){mean(x)} # Using mean on array f3 <- function(x = a){colMeans(x)} # Using colMeans on array # Compare elapsed time system.time(f1()) # elapsed time = 0.344 system.time(f2()) # elapsed time = 0.366 system.time(f3()) # elapsed time = 0.166
colMeans()
in an array is faster than mean()
on a vector or array. However, most of the time this acceleration will be negligible. I find it more natural to use mean()
for a vector or single column array. But, if you are a true speed demoner, you can sleep better at night knowing that you save several hundred milliseconds of processing time using colMeans()
for single columns.