I use by
to apply a function to the columns of a range of data frame based on a coefficient. Everything works fine if I use mean()
as a function, but if I use median()
, I get an error like "Error in median.default (x): numeric data needed" even if I don't have NA in the data frame.
A line that works with mean()
:
by(iris[,1:3], iris$Species, function(x) mean(x,na.rm=T)) > by(iris[,1:3], iris$Species, function(x) mean(x,na.rm=T)) iris$Species: setosa Sepal.Length Sepal.Width Petal.Length 5.006 3.428 1.462 ------------------------------------------------------------ iris$Species: versicolor Sepal.Length Sepal.Width Petal.Length 5.936 2.770 4.260 ------------------------------------------------------------ iris$Species: virginica Sepal.Length Sepal.Width Petal.Length 6.588 2.974 5.552 Warning messages: 1: mean(<data.frame>) is deprecated. Use colMeans() or sapply(*, mean) instead. 2: mean(<data.frame>) is deprecated. Use colMeans() or sapply(*, mean) instead. 3: mean(<data.frame>) is deprecated. Use colMeans() or sapply(*, mean) instead.
But if I use median()
(pay attention to na.rm=T option
):
> by(iris[,1:3], iris$Species, function(x) median(x,na.rm=T)) Error in median.default(x, na.rm = T) : need numeric data
However, if instead of selecting a range of [,1:3]
columns, I select only one of the columns that it works:
> by(iris[,1], iris$Species, function(x) median(x,na.rm=T)) iris$Species: setosa [1] 5 ------------------------------------------------------------ iris$Species: versicolor [1] 5.9 ------------------------------------------------------------ iris$Species: virginica [1] 6.5
How can I achieve this behavior when choosing a range of columns?