Building an identifier row for each row in the data

I have the following data:

library(data.table) d = data.table(a = c(1:3), b = c(2:4)) 

and would like to get this result (so that it works with an arbitrary number of columns):

 d[, c := paste0('a_', a, '_b_', b)] d # abc #1: 1 2 a_1_b_2 #2: 2 3 a_2_b_3 #3: 3 4 a_3_b_4 

The following works, but I hope to find something shorter and clearer.

 d = data.table(a = c(1:3), b = c(2:4)) d[, c := apply(mapply(paste, names(.SD), .SD, MoreArgs = list(sep = "_")), 1, paste, collapse = "_")] 
+6
source share
3 answers

one way, only a little cleaner:

 d[, c := apply(d, 1, function(x) paste(names(d), x, sep="_", collapse="_")) ] abc 1: 1 2 a_1_b_2 2: 2 3 a_2_b_3 3: 3 4 a_3_b_4 
+3
source

Here is an approach using do.call('paste') but requiring only one call to paste

I will focus on a situation where the columns are integers (as this seems like a more reasonable test case

 N <- 1e4 d <- setnames(as.data.table(replicate(5, sample(N), simplify = FALSE)), letters[seq_len(5)]) f5 <- function(d){ l <- length(d) o <- c(1L, l + 1L) + rep_len(seq_len(l) -1L, 2L * l) do.call('paste',c((c(as.list(names(d)),d))[o],sep='_'))} microbenchmark(f1(d), f2(d),f5(d)) Unit: milliseconds expr min lq median uq max neval f1(d) 41.51040 43.88348 44.60718 45.29426 52.83682 100 f2(d) 193.94656 207.20362 210.88062 216.31977 252.11668 100 f5(d) 30.73359 31.80593 32.09787 32.64103 45.68245 100 
+2
source

To avoid looping through lines, you can use this:

do.call(paste, c(lapply(names(d), function(n)paste0(n,"_",d[[n]])), sep="_"))

Benchmarking:

 N <- 1e4 d <- data.table(a=runif(N),b=runif(N),c=runif(N),d=runif(N),e=runif(N)) f1 <- function(d) { do.call(paste, c(lapply(names(d), function(n)paste0(n,"_",d[[n]])), sep="_")) } f2 <- function(d) { apply(d, 1, function(x) paste(names(d), x, sep="_", collapse="_")) } require(microbenchmark) microbenchmark(f1(d), f2(d)) 

Note: f2 inspired by @Ricardo's answer.

Results:

 Unit: milliseconds expr min lq median uq max neval f1(d) 195.8832 213.5017 216.3817 225.4292 254.3549 100 f2(d) 418.3302 442.0676 451.0714 467.5824 567.7051 100 

Edit note: the previous benchmarking with N <- 1e3 did not show much time difference. Thanks again @eddi.

+1
source

Source: https://habr.com/ru/post/948582/


All Articles