Repeat data.frame lines

I want to repeat the lines of data.frame, every N times. The result should be a new data.frame (with nrow(new.df) == nrow(old.df) * N ) preserving the column data types.

Example for N = 2:

  ABC ABC 1 ji 100 1 ji 100 --> 2 ji 100 2 KP 101 3 KP 101 4 KP 101 

So, each line is repeated 2 times, and symbols remain symbols, factors remain factors, numerical values ​​remain numbers, ...

My first attempt is used: apply(old.df, 2, function(co) rep(co, each = N)) , but this converts my values ​​to characters, and I get:

  ABC [1,] "j" "i" "100" [2,] "j" "i" "100" [3,] "K" "P" "101" [4,] "K" "P" "101" 
+79
r dataframe rows repeat
Jun 20 2018-12-12T00:
source share
9 answers
 df <- data.frame(a=1:2, b=letters[1:2]) df[rep(seq_len(nrow(df)), each=2),] 
+121
Jun 20 2018-12-12T00:
source share
β€” -

dplyr net solution taken from here

 library(dplyr) df <- data_frame(x = 1:2, y = c("a", "b")) df %>% slice(rep(1:n(), each = 2)) 
+39
Dec 12 '17 at 19:53 on
source share

If you can repeat all this or multiply it first, and then repeat this, then this similar question may be useful. Again:

 library(mefa) rep(mtcars,10) 

or simply

 mefa:::rep.data.frame(mtcars) 
+6
Apr 24 '13 at 22:20
source share

The rep.row function seems to sometimes create lists for columns, which results in poor memory. I wrote the following, which seems to work well:

 library(plyr) rep.row <- function(r, n){ colwise(function(x) rep(x, n))(r) } 
+5
May 30 '13 at 18:31
source share

Adding to the fact that @dardisco mentioned mefa::rep.data.frame() , it is very flexible.

You can either repeat each line N times :

 rep(df, each=N) 

or repeat the entire data frame N times (think: for example, when you process a vectorized argument)

 rep(df, times=N) 

Two thumbs up for mefa ! I have never heard of this so far, and I had to write manual code to do this.

+4
May 20 '14 at 2:23
source share

For reference and adding answers quoting mefa, it might be worth taking a look at the implementation of mefa::rep.data.frame() if you don't want to include the whole package:

 > data <- data.frame(a=letters[1:3], b=letters[4:6]) > data ab 1 ad 2 be 3 cf > as.data.frame(lapply(data, rep, 2)) ab 1 ad 2 be 3 cf 4 ad 5 be 6 cf 
+4
Jul 21 '15 at 18:53
source share

My solution is similar to mefa:::rep.data.frame , but a little faster and takes care of line names:

 rep.data.frame <- function(x, times) { rnames <- attr(x, "row.names") x <- lapply(x, rep.int, times = times) class(x) <- "data.frame" if (!is.numeric(rnames)) attr(x, "row.names") <- make.unique(rep.int(rnames, times)) else attr(x, "row.names") <- .set_row_names(length(rnames) * times) x } 

Compare Solutions:

 library(Lahman) library(microbenchmark) microbenchmark( mefa:::rep.data.frame(Batting, 10), rep.data.frame(Batting, 10), Batting[rep.int(seq_len(nrow(Batting)), 10), ], times = 10 ) #> Unit: milliseconds #> expr min lq mean median uq max neval cld #> mefa:::rep.data.frame(Batting, 10) 127.77786 135.3480 198.0240 148.1749 278.1066 356.3210 10 a #> rep.data.frame(Batting, 10) 79.70335 82.8165 134.0974 87.2587 191.1713 307.4567 10 a #> Batting[rep.int(seq_len(nrow(Batting)), 10), ] 895.73750 922.7059 981.8891 956.3463 1018.2411 1127.3927 10 b 
+2
Mar 01 '16 at 17:15
source share

try using for example

 N=2 rep(1:4, each = N) 

like an index

+1
Jun 20 2018-12-12T00:
source share

Another way to do this is to first get the row indices, add additional copies of df, and then sort by index:

 df$index = 1:nrow(df) df = rbind(df,df) df = df[order(df$index),][,-ncol(df)] 

Although other solutions may be shorter, this method may be more beneficial in certain situations.

0
Jun 03 '15 at 12:07 on
source share



All Articles