Faster code in R

FYI: I have edited this significantly since my first release. This simulation has been reduced from 14 to 14 minutes.

I am new to programming, but I did a simulation that tries to monitor asexual replication in the body and quantify the differences in the number of chromosomes between the parent and daughter organisms. Modeling is very slow. It takes about 6 hours. I wanted to know what would be the best way to speed up the simulation.

These digital organisms have x number of chromosomes. Unlike most organisms, chromosomes are all independent of each other, so they have an equal chance of being transferred to any daughter organism.

In this case, the distribution of chromosomes in the daughter cell follows the binomial distribution with a probability of 0.5.

The sim_repo function takes a matrix of digital organisms with a known number of chromosomes and puts them through 12 generations of replication. It duplicates these chromosomes and then uses the rbinom function to randomly create a number. This number is then assigned to the child cell. Since no chromosomes are lost during asexual reproduction, the other daughter cell receives the remaining chromosomes. Then this is repeated for the number of G generations. Then, from each row in the matrix, one value is selected.

  sim_repo = function( x1, G=12, k=1, t=25, h=1000 ) { # x1 is the list of copy numbers for a somatic chromosome # G is the number of generations, default is 12 # k is the transfer size, default is 1 # t is the number of transfers, default is 25 # h is the number of times to replicate, default is 1000 dup <- x1 * 2 # duplicate the initial somatic chromosome copy number for replication pop <- 1 # set generation time set.seed(11) z <- matrix(rbinom(n=rep(1,length(dup)),size = as.vector(dup),prob = 0.5),nrow = nrow(dup)) # amount of somatic chromosome is distributed to one of the daughter cells z1 <- dup - z # as no somatic chromosomes are lost, the other daughter cells receives the remainder somatic chromosomes x1 <- cbind(z, z1) # put both in a matrix for ( pop in 1:G ) { # this loop does the replication for each cell in each generation pop <- 1 + pop # number of generations. This is a count for the for loop dup <- x1 * 2 # double the somatic chromosomes for replication set.seed(11) z <- matrix(rbinom(n=rep(1,length(dup)),size = as.vector(dup),prob = 0.5),nrow = nrow(dup)) # amount of somatic c hromosomes distributed to one of the daughter cells z1 <- dup - z # as no somatic chromosomes are lost, the other daughter cells receives the remainder somatic chromosomes x1 <- cbind(z, z1) # put both in a matrix } # the following for loop randomly selects one cell in the population that was created # the output is a matrix of 1 column x1 <- matrix(apply(x1, 1, sample, size=k), ncol=1) x1 } 

In my study, I am interested in the change in variance in the chromosomes of the original ancestral organisms and the final point in time in this simulation. The following function represents the transfer of a cell to a new habitat. It deduces from sim_re p function and uses it to generate more generations. Then he finds the variance between the rows in the first and last columns of the matrix and finds the difference between them.

  # The following function is mostly the same as I talked about in the description. # The only difference is I changed some aspects to take into account I am using # matrices and not lists. # The function outputs the difference between the intial variance component between # 'cell lines' with the final variance after t number of transfers sim_exp = function( x1, G=12, k=1, t=25, h=1000 ) { xn <- matrix(NA, nrow(x1), t) x <- x1 xn[,1] <- x1 for ( l in 2:t ) { x <- sim_repo( x, G, k, t, h ) xn[, l] <- x } colvar <- matrix(apply(xn,2,var),ncol=ncol(xn)) ivar <- colvar[,1] fvar <- colvar[,ncol(xn)] deltavar <- fvar - ivar deltavar } 

I need to repeat this simulation h times. So I made the following function that will call the sim_exp h function a number of times.

 sim_1000 = function( x1, G=12, k=1, t=25, h=1000 ) { xn <- vector(length=h) for ( l in 2:h ) { x <- sim_exp( x1, G, k, t, h ) xn[l] <- x } xn } 

When I call the sim_exp function with a value of 6 values, it takes about 52 seconds to complete.

  x1 <- matrix(data=c(100,100,100,100,100,100),ncol=1) system.time(sim_1000(x1,h=1)) user system elapsed 1.280 0.105 1.369 

If I can get it faster, then I can do more of these simulations and apply the selection model to the simulation.

My input will look like this: x1 , a matrix with each generic organism in its row:

 x1 <- matrix(data=c(100,100,100,100,100,100),ncol=1) # a matrix of 6 organisms 

When I run:

 a <- sim_repo(x1, G=12, k=1) 

My expected result would be:

  a [,1] [1,] 137 [2,] 82 [3,] 89 [4,] 135 [5,] 89 [6,] 109 system.time(sim_repo(x1)) user system elapsed 1.969 0.059 2.010 

When I call the sim_exp function,

b <- sim_exp (x1, G = 12, k = 1, t = 25)

it calls the sim_repo function G times and outputs:

  b [1] 18805.47 

When I call the sim_1000 function, I usually set h to 1000, but here I set it to 2. So, here sim_1000 will call sim_exp and repeat it 2 times.

 c <- sim_1000(x1, G=12, k=1, t=25, h=2) c [1] 18805.47 18805.47 
+6
source share
1 answer

As others mentioned in the comments, if we look only at the sim_repo function and replace the line:

 dup <- apply(x1, c(1,2),"*",2) 

from

 dup <- x1 * 2 

the lines

 z <- apply(dup,c(1,2),rbinom,n=1,prob=0.5) 

from

 z <- matrix(rbinom(n=rep(1,length(dup)),size = as.vector(dup),prob = 0.5),nrow = nrow(dup)) 

and inner for loop with

 x1 <- matrix(apply(x1,1,sample,size = 1), ncol=1) 

I get a big speed increase:

 system.time(sim_exp(x1)) user system elapsed 0.655 0.017 0.686 > system.time(sim_expOld(x1)) user system elapsed 21.445 0.128 21.530 

And I confirmed that he does the same:

 set.seed(123) out1 <- sim_exp(x1) set.seed(123) out2 <- sim_expOld(x1) all.equal(out1,out2) > TRUE 

And that doesn't even go into pre-allocation, which can be difficult without redesigning things, given how you structured your code.

And it also doesn't even start to see if you really need all three functions ...

+8
source

Source: https://habr.com/ru/post/910129/


All Articles