Divide one row after every third column and move these 3 columns to a new row in r

I have a data frame that is the result of another command. This data frame has only one row containing about 40,000 records. My problem is that 3 columns represent one connected dataset. So I want to split a row after every third column and pass this as a new row. Example:

Create a test data frame:

df=as.data.frame(matrix(seq(1:12), ncol=12, nrow=1)) 

Now I have a data frame that looks like this.

 V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 V11 V12 1 2 3 4 5 6 7 8 9 10 11 12 

But I need it like this:

 V1 V2 V3 1 2 3 4 5 6 7 8 9 10 11 12 

How can I understand that?

+5
source share
3 answers

Try

 as.data.frame(matrix(unlist(df, use.names=FALSE),ncol=3, byrow=TRUE)) # V1 V2 V3 #1 1 2 3 #2 4 5 6 #3 7 8 9 #4 10 11 12 

Or you can directly use matrix on df

  as.data.frame(matrix(df, ncol=3, byrow=TRUE)) 
+4
source

You can also try using dim<- (for general knowledge only)

 as.data.frame(t(`dim<-`(unlist(df), c(3, 4)))) # V1 V2 V3 # 1 1 2 3 # 2 4 5 6 # 3 7 8 9 # 4 10 11 12 
+2
source

This turned out to be faster than I expected (although still not as fast as the obvious approach @akrun used), so I'm going to post this (like David) "for general knowledge only." (Also, "data.table" is all things.) :-)

Create data.table with three columns:

  • Optional values ​​for your individual string.
  • A grouping variable indicating which row should be assigned a value in the final result.
  • A grouping variable indicating in which column the value of the final result should be assigned.

Once you do this, you can use dcast.data.table to get the output you selected (plus a bonus column).

For point number 2 above, we can easily define a function similar to the following to simplify the process of creating groups:

 groupMaker <- function(vecLen, perGroup) { (0:(vecLen-1) %/% perGroup) + 1 } 

Then we can use it as follows:

 dcast.data.table( data.table(value = unlist(df, use.names = FALSE), row = groupMaker(ncol(df), 3), col = 1:3), row ~ col) # row 1 2 3 # 1: 1 1 2 3 # 2: 2 4 5 6 # 3: 3 7 8 9 # 4: 4 10 11 12 

Now you mention that you are actually dealing with a ~ 40K column with one row of data.frame (I assume that it is 39999 columns, since it is beautifully divisible by 3, and I don't want to break other answers).

With that in mind, here are some (useless) tests (useless because we say milliseconds here, really).

 set.seed(1) S <- sample(20, 39999, TRUE) S <- data.frame(t(S)) funAM <- function(indf) { dcast.data.table( data.table(value = unlist(indf, use.names = FALSE), row = groupMaker(ncol(indf), 3), col = 1:3), row ~ col) } funDA <- function(indf) { as.data.frame(t(`dim<-`(unlist(indf), c(3, ncol(indf)/3)))) } funAK <- function(indf) as.data.frame(matrix(indf, ncol=3, byrow=TRUE)) library(microbenchmark) microbenchmark(funAM(S), funDA(S), funAK(S)) # Unit: milliseconds # expr min lq mean median uq max neval # funAM(S) 18.487001 18.813297 22.105766 18.999891 19.455812 50.25876 100 # funDA(S) 37.187177 37.450893 40.393893 37.870683 38.869726 94.20128 100 # funAK(S) 5.018571 5.149758 5.929944 5.271679 5.536449 26.93281 100 

If this one could be useful, this would be in cases where the number of columns needed and the number of input columns would not be nicely divided.

For example, try the following sample data:

 set.seed(1) S2 <- sample(20, 40000, TRUE) S2 <- data.frame(t(S)) 

With example data:

  • funAM will provide you with a warning , but correctly provide you with the last two columns of the last row as NA .
  • funAK will give you a warning , but (presumably) will incorrectly process the values ​​on the last line.
  • funDA will just give you error .

I still think you should just fix the problem in the source though :-)

+1
source

Source: https://habr.com/ru/post/1207026/


All Articles