R data.table: adding a new column for a subset of rows conditionally for all rows

Question

R data.table: adding a new column for a subset of rows conditionally for all rows

Task: for all, condition==FALSEset the groupmean value for all numbersto group. For all, condition==TRUEset groupmean means meaning numbersonly where condition==TRUEon group. I would like to have a solution that does not require copying the entire data table, but adds the desired column. I am sure there is a simple simple solution, but I got a little lost ...

My attempts:

set.seed(42)
require(data.table)

DT <- data.table(condition=sample(c(TRUE,FALSE), 50, replace=T),
                 group=rep(LETTERS[1:4], times=25),
                 numbers=1:100)

# modifies the right rows, but wrong value
DT[condition==FALSE, groupmean_1 := mean(numbers), by=group]

# right values, but not only rows where condition=FALSE
DT[, groupmean_2 := mean(numbers), by=group]

head(DT)
     condition group numbers groupmean_1 groupmean_2
1:     FALSE     A       1    42.66667          49
2:     FALSE     B       2    55.68421          50
3:      TRUE     C       3          NA          51
4:     FALSE     D       4    47.78947          52
5:     FALSE     A       5    42.66667          49
6:     FALSE     B       6    55.68421          50

+2

r data.table

Christian borck May 07 '14 at 9:27

source share

1 answer

ilir · Accepted Answer · 2014-05-07T09:45:01+0000

You must change the definition sequence groupmean. Calculate it as the group average for all rows and then replace the rows where condition == TRUE.

DT[, groupmean:=mean(numbers), by=group]
DT[condition==TRUE, groupmean:=mean(numbers), by='group,condition']

I hope this helps

R data.table: adding a new column for a subset of rows conditionally for all rows

More articles: