Why does sd in R return a vector for inputting a matrix, and what can I do with it?

I am a bit confused why the sd function in R returns an array to input the matrix (I assume that there will always be one to maintain backward compatibility). This is a very strange behavior for me:

 #3d input, same same print(length(mean(array(rnorm(60),dim=c(3,4,5))))) print(length(sd(array(rnorm(60),dim=c(3,4,5))))) #1d input, same same print(length(mean(array(rnorm(60),dim=c(60))))) print(length(sd(array(rnorm(60),dim=c(60))))) #2d input, different! print(length(mean(array(rnorm(60),dim=c(12,5))))) print(length(sd(array(rnorm(60),dim=c(12,5))))) 

I get

 [1] 1 [1] 1 [1] 1 [1] 1 [1] 1 [1] 5 

This sd behaves differently than mean when the input is a 2-dimensional array (and, apparently, only in this case!). Consider that this unsuccessful function rescales each column of a k-dimensional array using standard deviation:

 re.scale <- function(x) { #rescale by the standard deviation of each column scales <- apply(x,2,sd) ret.val <- sweep(x,2,scales,"/") } #this works just fine x <- array(rnorm(60),dim=c(12,5)) y <- re.scale(x) #this throws a warning x <- array(rnorm(60),dim=c(3,4,5)) y <- re.scale(x) 

Is there any other function to replace sd without this weird behavior? How to write re.scale correctly? Or the Z-score-by-column function?

+6
source share
2 answers

It behaves like a document on the sd help page. At the very top, it announces:

"If x is a matrix or data frame, the column standard deviation vector is returned."

Note that it does not say that arrays are included, therefore only arrays with two dimensions are included. If you want to stop this behavior, just make a vector from it with c ():

  sd( c(array(rnorm(60),dim=c(12,5))) ) # [1] 0.9505643 

I see that you added a query to evaluate the z column. Try this for matrices:

 colMeans(x)/sd(x) 

And this is for arrays (although the definition of a "column" may require clarification:

 apply(x, 2:3, mean)/apply(x, 2:3, sd) # will generalize to higher dimensions 
+7
source

Sd actions have been changed:

1. version 2.13.2 (2011-09-30) and earlier

 > set.seed(1) > sd(array(rnorm(60),dim=c(12,5))) [1] 0.8107276 1.1234795 0.7925743 0.6186082 0.9464160 

Description

This function calculates the standard deviation of x values. If na.rm is TRUE, then the missing values โ€‹โ€‹are deleted before the calculation continues. If x is a matrix or data frame, the standard vector is column deviation .


2. R version 2.14.0 (2011-10-31) - 2.15.3 (2013-03-01)

 > set.seed(1) > sd(array(rnorm(60),dim=c(12,5))) [1] 0.8107276 1.1234795 0.7925743 0.6186082 0.9464160 WARNING๏ผš sd(<matrix>) is deprecated. Use apply(*, 2, sd) instead. 

More details

Prior to R 2.14.0, sd (dfrm) worked directly for data.frame DFRM. Now it is deprecated and you should use sapply (dfrm, sd).


3. R โ€‹โ€‹version 3.0.0 (2013-04-03) and later

 > sd(array(rnorm(60),dim=c(12,5))) [1] 0.8551688 > (no WARNIG) 
+3
source

Source: https://habr.com/ru/post/895515/


All Articles