Changing multiple columns in data.table r

Question

Changing multiple columns in data.table r

I am looking for a way to manipulate multiple columns in data.table in R. Since I need to dynamically process the columns as well as the second input, I could not find the answer.

The idea is to index two or more series on a specific date, dividing all the values by the date value, for example:

set.seed(132) # simulate some data dt <- data.table(date = seq(from = as.Date("2000-01-01"), by = "days", length.out = 10), X1 = cumsum(rnorm(10)), X2 = cumsum(rnorm(10))) # set a date for the index indexDate <- as.Date("2000-01-05") # get the column names to be able to select the columns dynamically cols <- colnames(dt) cols <- cols[substr(cols, 1, 1) == "X"]

Part 1: Easy data.frame / apply approach

 df <- as.data.frame(dt) # get the right rownumber for the indexDate rownum <- max((1:nrow(df))*(df$date==indexDate)) # use apply to iterate over all columns df[, cols] <- apply(df[, cols], 2, function(x, i){x / x[i]}, i = rownum)

Part 2: approach (fast) data.table So far my data.table method is as follows:

 for(nam in cols) { div <- as.numeric(dt[rownum, nam, with = FALSE]) dt[ , nam := dt[,nam, with = FALSE] / div, with=FALSE] }

especially all with = FALSE do not look very similar to data.table-like.

Do you know a faster / more elegant way to perform this operation?

Any idea is much appreciated!

+6

r data.table

David May 16, '15 at 15:18

source share

1 answer

akrun · Accepted Answer · 2015-05-16T15:28:17+0000

One option is to use set , as this includes multiple columns. The advantage of using set is that it will avoid the overhead of [.data.table and make it faster.

 library(data.table) for(j in cols){ set(dt, i=NULL, j=j, value= dt[[j]]/dt[[j]][rownum]) }

Or it will be a little slower

 dt[, (cols) :=lapply(.SD, function(x) x/x[rownum]), .SDcols=cols]

Changing multiple columns in data.table r

More articles: