Multiple conditions for computing r data.table

I have a data table. For instance:

   Sim j active cost
1:   1 1      1  100
2:   1 2      1  125
3:   1 3      0  200
4:   1 4      1  250
5:   2 1      1  100
6:   2 2      0  50
7:   2 3      0  125
8:   2 4      1  200

dt <- data.table(Sim = c(1, 1, 1, 1, 2, 2, 2, 2),
             j = c(1, 2, 3, 4, 1, 2, 3, 4),
             active = c(1, 1, 0, 1, 1, 0, 0, 1),
             cost = c(100, 125, 200, 250, 100, 50, 125, 200))

I want to add the column "incr_cost", which subtracts the cost in each row i from the cost in another row, which I will call row k, where row k meets the following conditions:

  • sim_k = sim_i
  • active_k = 1
  • j_k <j_i
  • row k contains the largest j of all rows satisfying the three conditions above

For lines where j = 1, incr_cost might just be NA.

In my example, the solution would look like this:

   Sim j active cost incr_cost
1:   1 1      1  100        NA
2:   1 2      1  125        25
3:   1 3      0  200        75
4:   1 4      1  250       125
5:   2 1      1  100        NA
6:   2 2      0   50       -50
7:   2 3      0  125        25
8:   2 4      1  200       100

, , , "" data.table, , , , , . , j, ( ).

, , k:

dt[, incr_cost := cost - shift(cost, fill=NA), by=Sim]

r data.table, non-data.table. !

+4
1

:

dt[, v := 
  cost - .SD[.(active = 1, Sim = Sim, j = j - 1), on=.(active, Sim, j), roll=TRUE, x.cost]]

   Sim j active cost   v
1:   1 1      1  100  NA
2:   1 2      1  125  25
3:   1 3      0  200  75
4:   1 4      1  250 125
5:   2 1      1  100  NA
6:   2 2      0   50 -50
7:   2 3      0  125  25
8:   2 4      1  200 100

.(active = 1, Sim = Sim, j = j - 1), , "" j, , .

j of x[i, j], .SD , " ".

j x[i, on=, roll=, j]...

  • x.* x ( .SD);
  • i.* i (, ).

( OP j . j, DT[i, j, ...].)

+3

Source: https://habr.com/ru/post/1696375/


All Articles