Use by = each row for data table

I have a data table, and I'm trying to create a new variable, which is a function of all the other columns. A simplified example would be if I just wanted to sum or take the average of all the lines. For instance:

dt <- data.table(a = 1:9, b = seq(10,90,10), c = seq(11:19), d = seq(100, 900, 100)) 

I want to create a vector / column that is just the average value for all columns. A syntax that I think would look something like this:

 dt[, average := mean(.SD)] 

However, this sums it all up. I also know what I can do:

 dt[, average := lapply(.SD, mean)] 

But this gives a single line result. I am basically looking for the equivalent:

 dt[, average := lapply(.SD, mean), by = all] 

so he just calculates this for all rows, without creating an id column and doing all my calculations on that column. Is it possible?

+5
source share
2 answers

The following data table code worked for me.

  dt[, average := rowMeans(.SD)] 

As @jangorecki pointed out, you can build your own function to run on a line, as long as you remember that each line is a list object:

 # my function, must unlist the argument myMean <- function(i, ...) mean(unlist(i), ...) 

using by=seq_len

 dt[, averageNew := myMean(.SD), by = seq_len(nrow(dt))] 

using row.names

 dt[, averageOther := myMean(.SD), by = row.names(dt)] 
+5
source

I think that a much better solution is to simply use apply for this, which was done for string matrix operations where there was no data.table .

 > dt$average = apply(dt, 1, mean) > dt abcd average 1: 1 10 1 100 28 2: 2 20 2 200 56 3: 3 30 3 300 84 4: 4 40 4 400 112 5: 5 50 5 500 140 6: 6 60 6 600 168 7: 7 70 7 700 196 8: 8 80 8 800 224 9: 9 90 9 900 252 
+3
source

Source: https://habr.com/ru/post/1247752/


All Articles