Calculate the total compensation amount starting at 0 for each group.

My sample data is as follows:

>         gros id nr_oriz
>      1:   23  1       1
>      2:   16  1       2
>      3:   14  1       3
>      4:   15  1       4
>      5:   22  1       5
>      6:   30  1       6
>      7:   25  2       1
>      8:   10  2       2
>      9:   13  2       3
>     10:   17  2       4
>     11:   45  2       5
>     12:   25  4       1
>     13:   15  4       2
>     14:   20  4       3
>     15:   20  4       4
>     16:   20  4       5

where grosis the depth for each soil horizon, idis the profile number, and nr_horizis the number of the soil horizon. I need to create two columns: upper and lower, where the upper upper limit of the horizon and the lower limit. We managed to get only the lower values ​​using:

topsoil$bottom<-ave(topsoil$gros,topsoil$id,FUN=cumsum)

but for the upper values, we need to somehow compensate for the data for each idand calculate the total amount, starting from 0 and without the last value, as in this example:

    gros id nr_oriz top bottom
 1:   23  1       1   0     23
 2:   16  1       2  23     39
 3:   14  1       3  39     53
 4:   15  1       4  53     68
 5:   22  1       5  68     90
 6:   30  1       6  90    120
 7:   25  2       1   0     25
 8:   10  2       2  25     35
 9:   13  2       3  35     48
10:   17  2       4  48     65
11:   45  2       5  65    110
12:   25  4       1   0     25
13:   15  4       2  25     40
14:   20  4       3  40     60
15:   20  4       4  60     80
16:   20  4       5  80    100

Is there a simple solution for this, given that the database is very large and we cannot do it manually (as it was in the column topin this example).

+4
3

ave , "bottom" :

topsoil$top <- ave(topsoil$bottom, topsoil$id, FUN=function(x) c(0,x[-length(x)]))

, data.table, , data.table . bottom, :

topsoil[, bottom := cumsum(gros), by = id]

top:

topsoil[, top := c(0L, bottom[-.N]), by = id]

, @akrun answer.

+5

shift devel data.table. devel: here

library(data.table)#v1.9.5+
setDT(topsoil)[, c('top', 'bottom'):= {tmp <- cumsum(gros)
          list(top= shift(tmp, fill=0), bottom=tmp)}, by = id]
topsoil
#    gros id nr_oriz top bottom
# 1:   23  1       1   0     23
# 2:   16  1       2  23     39
# 3:   14  1       3  39     53
# 4:   15  1       4  53     68
# 5:   22  1       5  68     90
# 6:   30  1       6  90    120
# 7:   25  2       1   0     25
# 8:   10  2       2  25     35
# 9:   13  2       3  35     48
#10:   17  2       4  48     65
#11:   45  2       5  65    110
#12:   25  4       1   0     25
#13:   15  4       2  25     40
#14:   20  4       3  40     60
#15:   20  4       4  60     80
#16:   20  4       5  80    100
+4
library(dplyr)
df %>% group_by(id) %>%
       mutate(bottom = cumsum(gros), top = lag(bottom)) %>%
       replace(is.na(.), 0)
0
source

Source: https://habr.com/ru/post/1598890/


All Articles