Index Concatenation (Insertion) Elements

I want to concatenate ( paste ) rows of elements in data.frame based on indexes. How can I do this efficiently? It is easy with a unique identifier, but here I have indexes that are not a factor to combine. All package options and basic solutions are welcome.

 indexes <- list( 2:3, 6:8, 11:12 ) dat <- data.frame( x = c(1, 2, 2, 3, 4, 5, 5, 5, 6, 7, 2, 2, 1), y = LETTERS[1:13], z = "PP", stringsAsFactors = FALSE ) xyz 1 1 A PP 2 2 B PP 3 2 C PP 4 3 D PP 5 4 E PP 6 5 F PP 7 5 G PP 8 5 H PP 9 6 I PP 10 7 J PP 11 2 K PP 12 2 L PP 13 1 M PP 

Required Conclusion:

  xyz 1 1 A PP 2 2 B, C PP 3 3 D PP 4 4 E PP 5 5 F, G, H PP 6 6 I PP 7 7 J PP 8 2 K, L PP 9 1 M PP 
+5
source share
3 answers

Another base R method:

 indx <- !(1:nrow(dat) %in% unlist(lapply(indexes, '[', -1))) transform(dat, y=ave(y, cumsum(indx), FUN=toString))[indx,] # xyz # 1 1 A PP # 2 2 B, C PP # 4 3 D PP # 5 4 E PP # 6 5 F, G, H PP # 9 6 I PP # 10 7 J PP # 11 2 K, L PP # 13 1 M PP 

Explanation

Some understanding of how !(1:nrow(dat) %in% unlist(lapply(indexes, '[', -1))) happened:

I tried to find an index for grouping. I started from the end and worked back. I knew that if I could:

 1 2 2 3 4 5 5 5 6 7 8 8 9 

I could use ave and run toString . I thought it was necessary to use a true and false combination to make cumsum above metric. I wrote this:

 cumsum(c(T, T, F, T, T, T, F, F, T, T, T, F, T)) [1] 1 2 2 3 4 5 5 5 6 7 8 8 9 

I needed to find a way to create this logical index. If all elements of the list of indexes that are not first are false, I will have the necessary logical index.

 unlist(lapply(indexes, '[', -1)) [1] 3 7 8 12 

You will notice that these positions are all false values ​​in the index.

dplyr

I guess it's only fair to add dplyr to the mix:

 dat %>% mutate(indx = na.omit(c(T, x != lead(x)))) %>% group_by(ind2=cumsum(indx)) %>% mutate(y=toString(y)) %>% filter(indx) 
+4
source

Here is one possible solution in the R database:

 dat[sapply(indexes, "[", 1), "y"] <- sapply(indexes, function(i) { paste(dat[["y"]][i], collapse = ", ") }) dat[ setdiff( 1:nrow(dat), setdiff( unlist(indexes), sapply(indexes, "[", 1) ) ), ] xyz 1 1 A PP 2 2 B, C PP 4 3 D PP 5 4 E PP 6 5 F, G, H PP 9 6 I PP 10 7 J PP 11 2 K, L PP 13 1 M PP 
+5
source

Here's a possible solution to data.table using set

 library(data.table) setDT(dat) for(i in seq_along(indexes)) { set(dat, i = indexes[[i]], j = 2L, value = dat[indexes[[i]], toString(y)]) } unique(dat, by = "y") # xyz # 1: 1 A PP # 2: 2 B, C PP # 3: 3 D PP # 4: 4 E PP # 5: 5 F, G, H PP # 6: 6 I PP # 7: 7 J PP # 8: 2 K, L PP # 9: 1 M PP 

The idea here is to work only with the indices specified in indexes when changing only the y column. It does not seem clear to me how this should happen if z and x change at these indices, which gives you the opportunity to set the variables in the by argument of the unique data.table method

+3
source

Source: https://habr.com/ru/post/1233267/


All Articles