Repeat data.frame lines

Question

Repeat data.frame lines

I want to repeat the lines of data.frame, every N times. The result should be a new data.frame (with nrow(new.df) == nrow(old.df) * N ) preserving the column data types.

Example for N = 2:

  ABC ABC 1 ji 100 1 ji 100 --> 2 ji 100 2 KP 101 3 KP 101 4 KP 101

So, each line is repeated 2 times, and symbols remain symbols, factors remain factors, numerical values remain numbers, ...

My first attempt is used: apply(old.df, 2, function(co) rep(co, each = N)) , but this converts my values to characters, and I get:

  ABC [1,] "j" "i" "100" [2,] "j" "i" "100" [3,] "K" "P" "101" [4,] "K" "P" "101"

+79

r dataframe rows repeat

Stefan Jun 20 2018-12-12T00:

source share

9 answers

dplyr net solution taken from here

 library(dplyr) df <- data_frame(x = 1:2, y = c("a", "b")) df %>% slice(rep(1:n(), each = 2))

+39

David Rubinger Dec 12 '17 at 19:53 on

source share

If you can repeat all this or multiply it first, and then repeat this, then this similar question may be useful. Again:

 library(mefa) rep(mtcars,10)

or simply

 mefa:::rep.data.frame(mtcars)

+6

dardisco Apr 24 '13 at 22:20

source share

The rep.row function seems to sometimes create lists for columns, which results in poor memory. I wrote the following, which seems to work well:

 library(plyr) rep.row <- function(r, n){ colwise(function(x) rep(x, n))(r) }

+5

jebyrnes May 30 '13 at 18:31

source share

Adding to the fact that @dardisco mentioned mefa::rep.data.frame() , it is very flexible.

You can either repeat each line N times :

 rep(df, each=N)

or repeat the entire data frame N times (think: for example, when you process a vectorized argument)

 rep(df, times=N)

Two thumbs up for mefa ! I have never heard of this so far, and I had to write manual code to do this.

+4

smci May 20 '14 at 2:23

source share

For reference and adding answers quoting mefa, it might be worth taking a look at the implementation of mefa::rep.data.frame() if you don't want to include the whole package:

 > data <- data.frame(a=letters[1:3], b=letters[4:6]) > data ab 1 ad 2 be 3 cf > as.data.frame(lapply(data, rep, 2)) ab 1 ad 2 be 3 cf 4 ad 5 be 6 cf

+4

Fabio Gabriel Jul 21 '15 at 18:53

source share

My solution is similar to mefa:::rep.data.frame , but a little faster and takes care of line names:

 rep.data.frame <- function(x, times) { rnames <- attr(x, "row.names") x <- lapply(x, rep.int, times = times) class(x) <- "data.frame" if (!is.numeric(rnames)) attr(x, "row.names") <- make.unique(rep.int(rnames, times)) else attr(x, "row.names") <- .set_row_names(length(rnames) * times) x }

Compare Solutions:

 library(Lahman) library(microbenchmark) microbenchmark( mefa:::rep.data.frame(Batting, 10), rep.data.frame(Batting, 10), Batting[rep.int(seq_len(nrow(Batting)), 10), ], times = 10 ) #> Unit: milliseconds #> expr min lq mean median uq max neval cld #> mefa:::rep.data.frame(Batting, 10) 127.77786 135.3480 198.0240 148.1749 278.1066 356.3210 10 a #> rep.data.frame(Batting, 10) 79.70335 82.8165 134.0974 87.2587 191.1713 307.4567 10 a #> Batting[rep.int(seq_len(nrow(Batting)), 10), ] 895.73750 922.7059 981.8891 956.3463 1018.2411 1127.3927 10 b

+2

Artem Klevtsov Mar 01 '16 at 17:15

source share

try using for example

 N=2 rep(1:4, each = N)

like an index

+1

shhhhimhuntingrabbits Jun 20 2018-12-12T00:

source share

Another way to do this is to first get the row indices, add additional copies of df, and then sort by index:

 df$index = 1:nrow(df) df = rbind(df,df) df = df[order(df$index),][,-ncol(df)]

Although other solutions may be shorter, this method may be more beneficial in certain situations.

0

crazjo Jun 03 '15 at 12:07 on

source share

Josh O'Brien · Accepted Answer · 2012-06-20 14:09

 df <- data.frame(a=1:2, b=letters[1:2]) df[rep(seq_len(nrow(df)), each=2),]

Repeat data.frame lines

More articles: