Use by = each row for data table

Question

Use by = each row for data table

I have a data table, and I'm trying to create a new variable, which is a function of all the other columns. A simplified example would be if I just wanted to sum or take the average of all the lines. For instance:

dt <- data.table(a = 1:9, b = seq(10,90,10), c = seq(11:19), d = seq(100, 900, 100))

I want to create a vector / column that is just the average value for all columns. A syntax that I think would look something like this:

 dt[, average := mean(.SD)]

However, this sums it all up. I also know what I can do:

 dt[, average := lapply(.SD, mean)]

But this gives a single line result. I am basically looking for the equivalent:

 dt[, average := lapply(.SD, mean), by = all]

so he just calculates this for all rows, without creating an id column and doing all my calculations on that column. Is it possible?

+5

r data.table

Brandon Apr 22 '16 at 20:01

source share

2 answers

I think that a much better solution is to simply use apply for this, which was done for string matrix operations where there was no data.table .

 > dt$average = apply(dt, 1, mean) > dt abcd average 1: 1 10 1 100 28 2: 2 20 2 200 56 3: 3 30 3 300 84 4: 4 40 4 400 112 5: 5 50 5 500 140 6: 6 60 6 600 168 7: 7 70 7 700 196 8: 8 80 8 800 224 9: 9 90 9 900 252

+3

Señor o Apr 22 '16 at 20:13

source share

lmo · Accepted Answer · 2016-04-22T20:17:57+0000

The following data table code worked for me.

  dt[, average := rowMeans(.SD)]

As @jangorecki pointed out, you can build your own function to run on a line, as long as you remember that each line is a list object:

 # my function, must unlist the argument myMean <- function(i, ...) mean(unlist(i), ...)

using by=seq_len

 dt[, averageNew := myMean(.SD), by = seq_len(nrow(dt))]

using row.names

 dt[, averageOther := myMean(.SD), by = row.names(dt)]

Use by = each row for data table

More articles: