The total amount due to the interval

I would like to calculate the conditional sum of a column of a data frame for a set of intervals [n, +∞)(i.e. β‰₯ n) applied to another column. In the examples below, the intervals are applied to the column a, and the values ​​in the column are bconditionally summed. For [0, +∞)all values ​​of a column a β‰₯ 0, therefore b_sum- this is the sum of all values. For [3, +∞)only one entry β‰₯ 3, therefore b_sumequal to 500.

Input data

  a    b          
1.1  100          
2.3  150          
0.1   20          
0.5   80          
3.3  500          
1.6  200
1.1  180

Desired Result

n  b_sum
0   1230
1   1130
2    650
3    500
4      0

I am sure this would be easy enough using a loop for; But; I would like to avoid this approach and use an approach using a vectorized base Ror dplyr.

+4
4

vapply

 n <- trunc(min(df1$a)) : ceiling(max(df1$a))


 b_sum <- vapply(n, function(i) sum(df1$b[!is.na(cut(df1$a,
                     breaks=c(i, Inf)))]), 0)
 b_sum
#[1] 1230 1130  650  500    0
data.frame(n, b_sum)

cut

vapply(n, function(i) sum(df1$b[df1$a>i]), 0)
#[1] 1230 1130  650  500    0
+4

df <- df[order(df$a), ] # sort by "a" column
ind <- findInterval(0:4, df$a) + 1 
sum(df$b) - cumsum(c(0,  df$b))[ind]
#[1] 1230 1130  650  500    0
+5

Boolean math. Multiply the vector by a logical condition that turns into 0/1

 sapply(0:4, function(n) { sum( (sub("\\..+$", "", inp$a) >= n )*inp$b ) } )
#[1] 1230 1130  650  500    0

data.frame( n=0:4, 
            b_sum= sapply(0:4, function(n) sum( sub("\\..+$", "", inp$a) >= n)*inp$b) )
+4
source

Another possibility:

data.frame(n = 0:4, b_sum = with(df, sum(b) - c(0, cumsum(tapply(b, floor(a), sum)))))
+2
source

Source: https://habr.com/ru/post/1623374/


All Articles