Runtime - Using Application Features

I have two application functions that subtract the mean and standard deviation in the first two dimensions on a large three-dimensional array (437216,8,3). Rx32 takes 16 minutes. This is the first of many large arrays in the database that we regularly use this script. Any thoughts on how to speed up the execution?

+3
source share
3 answers

It seems very slow. On my car

set.seed(10)

x = array(rnorm(437216*8*3), dim = c(437216,8,3))

system.time(apply(x, 1, mean))

It has

   user  system elapsed 
 23.903   0.263  24.522 

FWIW,

system.time(apply(x, 2, mean))
       user  system elapsed 
      0.546   0.274   0.841 


system.time(apply(x, 3, mean))
   user  system elapsed 
  0.516   0.267   0.790 

What is sessionInfo ()?

sessionInfo()
R version 2.11.1 (2010-05-31) 
i386-apple-darwin9.8.0 

locale:
[1] en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices datasets  utils     methods   base     

other attached packages:
[1] cimis_0.1-3    RLastFM_0.1-4  RCurl_1.4-2    bitops_1.0-4.1 XML_3.1-0      lattice_0.18-8

loaded via a namespace (and not attached):
[1] grid_2.11.1  tools_2.11.1
+1
source

EDIT: after the code provided by OP, the problem became clear. Trick - convert it to data frame:

> x = array(rnorm(437216*8*3), dim = c(437216,8,3))

> system.time(apply(x,1:2,mean))
   user  system elapsed 
 107.06    0.18  107.34 
 # This is run on a new quadcore i7, so it not a slow machine...

> Tmp <- data.frame(V1=as.vector(x[,,1]),
+             V2=as.vector(x[,,2]),
+             V3= as.vector(x[,,3]))

> system.time({
+     Means <- rowMeans(Tmp)
+     Sd <- sqrt(rowSums((Tmp-Means)^2)/(3-1))
+ })
   user  system elapsed 
   6.72    0.40    7.12 

To get the results in the correct matrix:

Means <- matrix(Means,ncol=8)
Sd <- matrix(Sd,ncol=8)

Proof of concept:

x = array(rnorm(10*8*3), dim = c(10,8,3))

m1 <- apply(x,1:2,mean)
sd1 <- apply(x,1:2,sd)

Tmp <- data.frame(V1=as.vector(x[,,1]),
            V2=as.vector(x[,,2]),
            V3= as.vector(x[,,3]))
m2 <- rowMeans(Tmp)

sd2 <- sqrt(rowSums((Tmp-m2)^2)/2)

m2 <-matrix(m2,ncol=8)
sd2 <- matrix(sd2,ncol=8)

> all.equal(m1,m2)
[1] TRUE

> all.equal(sd1,sd2)
[1] TRUE
0

My systemInfo() :

sessionInfo() R version 2.11.0 (2010-04-22) x86_64-pc-mingw32

locale: [1] LC_COLLATE=English_United States.1252 [2] LC_CTYPE=English_United States.1252 [3] LC_MONETARY=English_United States.1252 [4] LC_NUMERIC=C [5] LC_TIME=English_United States.1252

attached base packages: [1] stats     graphics  grDevices utils     datasets methods   base

other attached packages: [1] abind_1.1-0   RSQLite_0.9-1 DBI_0.2-5

The apply function is used for both the first and second fields (1: 2), and the system time is lower, which, in my opinion, causes what makes it work for so long. I ran it on a better computer / system (listed above) and reduced the runtime (below), but it still seems like it takes longer than necessary:

>  system.time(apply(x,1:2,mean))   
user  system elapsed
311.56    0.30  311.88
> system.time(apply(x,1:2,sd))    
user  system elapsed
505.92    0.21  506.81

I will consider converting it to a data.frame file and its list, as in the second sentence. Thanks for the help!

0
source

Source: https://habr.com/ru/post/1764175/


All Articles