Repeating data.frame rows in dplyr

I have a problem repeating rows of my real data using dplyr . There is already another repeat-rows-of-a-data-frame entry, but there is no solution for dplyr .

Here I am just wondering how there might be a solution for dplyr but not with an error:

Error: wrong result size (16), expected 4 or 1

 library(dplyr) df <- data.frame(column = letters[1:4]) df_rep <- df%>% mutate(column=rep(column,each=4)) 

Expected Result

 >df_rep column #a #a #a #a #b #b #b #b #* #* #* 
+5
r dplyr
Jul 07 '16 at 4:00
source share
4 answers

This is riddled with danger if there are other columns in data.frame (there, I said that!), But the do block will allow you to generate a derived data.frame in the dplyr channel (though, ceci n'est pas un pipe):

 library(dplyr) df <- data.frame(column = letters[1:4], stringsAsFactors = FALSE) df %>% do( data.frame(column = rep(.$column, each = 4), stringsAsFactors = FALSE) ) # column # 1 a # 2 a # 3 a # 4 a # 5 b # 6 b # 7 b # 8 b # 9 c # 10 c # 11 c # 12 c # 13 d # 14 d # 15 d # 16 d 
+4
Jul 07 '16 at 4:58
source share

I was looking for a similar (but slightly different) solution. Post here if it is useful to someone else.

In my case, I needed a more general solution that allows each letter to be repeated an arbitrary number of times. Here is what I came up with:

 library(tidyverse) df <- data.frame(letters = letters[1:4]) df > df letters 1 a 2 b 3 c 4 d 

Say I want 2 A, 3 B, 2 C and 4 D:

 df %>% mutate(count = c(2, 3, 2, 4)) %>% group_by(letters) %>% expand(count = seq(1:count)) # A tibble: 11 x 2 # Groups: letters [4] letters count <fctr> <int> 1 a 1 2 a 2 3 b 1 4 b 2 5 b 3 6 c 1 7 c 2 8 d 1 9 d 2 10 d 3 11 d 4 

If you do not want to store the count column:

 df %>% mutate(count = c(2, 3, 2, 4)) %>% group_by(letters) %>% expand(count = seq(1:count)) %>% select(letters) # A tibble: 11 x 1 # Groups: letters [4] letters <fctr> 1 a 2 a 3 b 4 b 5 b 6 c 7 c 8 d 9 d 10 d 11 d 

If you want the count to reflect the number of repetitions of each letter:

 df %>% mutate(count = c(2, 3, 2, 4)) %>% group_by(letters) %>% expand(count = seq(1:count)) %>% mutate(count = max(count)) # A tibble: 11 x 2 # Groups: letters [4] letters count <fctr> <dbl> 1 a 2 2 a 2 3 b 3 4 b 3 5 b 3 6 c 2 7 c 2 8 d 4 9 d 4 10 d 4 11 d 4 
+7
Feb 28 '18 at 22:39
source share

Using uncount will also solve this problem. The count column indicates how often the line should be repeated.

 library(tidyverse) df <- tibble(letters = letters[1:4]) df # A tibble: 4 x 1 letters <chr> 1 a 2 b 3 c 4 d df %>% mutate(count = c(2, 3, 2, 4)) %>% uncount(count) # A tibble: 11 x 1 letters <chr> 1 a 2 a 3 b 4 b 5 b 6 c 7 c 8 d 9 d 10 d 11 d 
+5
Apr 03 '19 at 10:00
source share

I did a quick test to show that uncount() much faster than expand()

 # for the pipe library(magrittr) # create some test data df_test <- tibble::tibble( letter = letters, row_count = sample(1:10, size = 26, replace = TRUE) ) # benchmark bench <- microbenchmark::microbenchmark( expand = df_test %>% dplyr::group_by(letter) %>% tidyr::expand(row_count = seq(1:row_count)), uncount = df_test %>% tidyr::uncount(row_count) ) # plot the benchmark ggplot2::autoplot(bench) 

Benchmark plot

+1
Sep 24 '19 at 8:24
source share



All Articles