R data.table: adding a new column for a subset of rows conditionally for all rows

Task: for all, condition==FALSEset the groupmean value for all numbersto group. For all, condition==TRUEset groupmean means meaning numbersonly where condition==TRUEon group. I would like to have a solution that does not require copying the entire data table, but adds the desired column. I am sure there is a simple simple solution, but I got a little lost ...

My attempts:

set.seed(42)
require(data.table)

DT <- data.table(condition=sample(c(TRUE,FALSE), 50, replace=T),
                 group=rep(LETTERS[1:4], times=25),
                 numbers=1:100)

# modifies the right rows, but wrong value
DT[condition==FALSE, groupmean_1 := mean(numbers), by=group]

# right values, but not only rows where condition=FALSE
DT[, groupmean_2 := mean(numbers), by=group]

head(DT)
     condition group numbers groupmean_1 groupmean_2
1:     FALSE     A       1    42.66667          49
2:     FALSE     B       2    55.68421          50
3:      TRUE     C       3          NA          51
4:     FALSE     D       4    47.78947          52
5:     FALSE     A       5    42.66667          49
6:     FALSE     B       6    55.68421          50
+2
source share
1 answer

You must change the definition sequence groupmean. Calculate it as the group average for all rows and then replace the rows where condition == TRUE.

DT[, groupmean:=mean(numbers), by=group]
DT[condition==TRUE, groupmean:=mean(numbers), by='group,condition']

I hope this helps

+2

Source: https://habr.com/ru/post/1683642/


All Articles