R: Quickly multiply selected rows in data.frame (or another data structure)

I have an object of type data.frame like this, but much more:

> head(mydf) id1 id2 n 1 0 1032142 3 2 0 1072163 1 3 0 119323 2 

I need to print in the columns of files a1 and a1 , each of which is n times. So that I can get a file like this:

 0 1032142 0 1032142 0 1032142 0 1072163 0 119323 0 119323 

I tried the following solutions, but they use explicit for loops and are incredibly slow (it takes several days to complete my data ...):

 for (j in 1:(nrow(mydf))) for (i in 1:(mydf[j,"n"])) write.table( mydf[j,c("id1","id2")], file="trials", append=T, row.names= F, col.names=F ) 

Another is trying to create a new data.frame with multiplied rows, but it runs even slower.

 towrite=data.frame(); for (j in 1:(nrow(mydf))) for (i in 1:(mydf[j,"n"])) towrite=rbind(towrite,mydf[j,c("id1","id2")]) 

What is the easiest and fastest way to resolve this with R?

+4
source share
3 answers

Try a subset of your data and save it in one batch:

 mydf[rep(1:nrow(mydf), mydf$n), ] 

If your data is numeric, then manipulating the matrix is ​​much faster:

 mymat <- as.matrix(mydf) reps <- as.integer(mydf$n) mymat[rep(1:nrow(mymat), reps), ] id1 id2 n 1 0 1032142 3 1 0 1032142 3 1 0 1032142 3 2 0 1072163 1 3 0 119323 2 3 0 119323 2 

If you manage to manipulate the original data.frame file, you can probably handle the above matrix.

+6
source

If you only want to write each line n times to a file, try:

Download demo data:

 data <- structure(list(id1 = c(0L, 0L, 0L), id2 = c(1032142L, 1072163L, 119323L), n = c(3L, 1L, 2L)), .Names = c("id1", "id2", "n"), class = "data.frame", row.names = c(NA, -3L)) 

And writing all the lines n times to "output.txt":

 file = 'output.txt' write.table(data[0,], file=file, row.names=FALSE) apply(data, 1, function(x) replicate(x[3], write.table(t(x[1:2]), file=file, append=TRUE, col.names=FALSE, row.names=FALSE))) 

I am sure that this could be written much nicer :)

+1
source

Perhaps you can try and apply the sink. I'm not sure what to apply is actually faster than for loops (like push and use).

 mydat=data.frame(id1=0,id2=rnorm(5),n=sample(1:10,5)) mydat sink("test.txt") apply(mydat,1,function(x)cat(paste(rep(paste(x[1:2],collapse="\t"),x[3]),"\n" ))) sink() 

I know the code looks awful

0
source

Source: https://habr.com/ru/post/1336572/


All Articles