This turned out to be faster than I expected (although still not as fast as the obvious approach @akrun used), so I'm going to post this (like David) "for general knowledge only." (Also, "data.table" is all things.) :-)
Create data.table with three columns:
- Optional values ββfor your individual string.
- A grouping variable indicating which row should be assigned a value in the final result.
- A grouping variable indicating in which column the value of the final result should be assigned.
Once you do this, you can use dcast.data.table to get the output you selected (plus a bonus column).
For point number 2 above, we can easily define a function similar to the following to simplify the process of creating groups:
groupMaker <- function(vecLen, perGroup) { (0:(vecLen-1) %/% perGroup) + 1 }
Then we can use it as follows:
dcast.data.table( data.table(value = unlist(df, use.names = FALSE), row = groupMaker(ncol(df), 3), col = 1:3), row ~ col)
Now you mention that you are actually dealing with a ~ 40K column with one row of data.frame (I assume that it is 39999 columns, since it is beautifully divisible by 3, and I don't want to break other answers).
With that in mind, here are some (useless) tests (useless because we say milliseconds here, really).
set.seed(1) S <- sample(20, 39999, TRUE) S <- data.frame(t(S)) funAM <- function(indf) { dcast.data.table( data.table(value = unlist(indf, use.names = FALSE), row = groupMaker(ncol(indf), 3), col = 1:3), row ~ col) } funDA <- function(indf) { as.data.frame(t(`dim<-`(unlist(indf), c(3, ncol(indf)/3)))) } funAK <- function(indf) as.data.frame(matrix(indf, ncol=3, byrow=TRUE)) library(microbenchmark) microbenchmark(funAM(S), funDA(S), funAK(S)) # Unit: milliseconds # expr min lq mean median uq max neval # funAM(S) 18.487001 18.813297 22.105766 18.999891 19.455812 50.25876 100 # funDA(S) 37.187177 37.450893 40.393893 37.870683 38.869726 94.20128 100 # funAK(S) 5.018571 5.149758 5.929944 5.271679 5.536449 26.93281 100
If this one could be useful, this would be in cases where the number of columns needed and the number of input columns would not be nicely divided.
For example, try the following sample data:
set.seed(1) S2 <- sample(20, 40000, TRUE) S2 <- data.frame(t(S))
With example data:
funAM will provide you with a warning , but correctly provide you with the last two columns of the last row as NA .funAK will give you a warning , but (presumably) will incorrectly process the values ββon the last line.funDA will just give you error .
I still think you should just fix the problem in the source though :-)