Excel SUMIFS equivalent in R

I am very new to R, and I am considering ways to recreate Excel VBA macros and Excel worksheet functions such as SUMIFS. SUMIFS summarizes a column if a row has records matching several conditions on other columns.

I have the following data frame and I want to compute a new column. The new column represents the Sample sum for all rows that overlap with the Start Date and EndDate ranges. For example, in line 1 it will be 697 (the sum of the first 3 lines ). Criteria for a specific amount: include Sample if EndDate >= StartDate[i] & StartDate <=EndDate[i]

  StartDate EndDate Sample *SUMIFS example* 10/01/14 24/01/14 139 *697* 12/01/14 26/01/14 136 19/01/14 02/02/14 422 25/01/14 08/02/14 762 29/01/14 12/02/14 899 05/02/14 19/02/14 850 07/02/14 21/02/14 602 09/02/14 23/02/14 180 18/02/14 04/03/14 866 

Any comments or pointers would be greatly appreciated.

+5
source share
4 answers

You can do this with a loop or with Cartesian merging. I do not know any built-in functions to do just that.

 library(dplyr) x = structure(list(StartDate = structure(c(1389312000, 1389484800, 1390089600, 1390608000, 1390953600, 1391558400, 1391731200, 1391904000, 1392681600), tzone = "UTC", class = c("POSIXct", "POSIXt")), EndDate = structure(c(1390521600, 1390694400, 1391299200, 1391817600, 1392163200, 1392768000, 1392940800, 1393113600, 1393891200), tzone = "UTC", class = c("POSIXct", "POSIXt" )), Sample = c(139L, 136L, 422L, 762L, 899L, 850L, 602L, 180L, 866L)), .Names = c("StartDate", "EndDate", "Sample" ), row.names = c(NA, -9L), class = "data.frame") x2 = x names(x2)=c('StartDate2','EndDate2','Sample2') x3 = merge(x,x2,allow.cartesian =T) x4 = summarise(group_by(x3,StartDate,EndDate), sumifs=sum(Sample2[EndDate2 >= StartDate & StartDate2 <= EndDate])) x_sumifs = merge(x,x4,by=c('StartDate','EndDate')) 

This is what the output looks like.

 > x_sumifs StartDate EndDate Sample sumifs 1 2014-01-10 2014-01-24 139 697 2 2014-01-12 2014-01-26 136 1459 3 2014-01-19 2014-02-02 422 2358 4 2014-01-25 2014-02-08 762 3671 5 2014-01-29 2014-02-12 899 3715 6 2014-02-05 2014-02-19 850 4159 7 2014-02-07 2014-02-21 602 4159 8 2014-02-09 2014-02-23 180 3397 9 2014-02-18 2014-03-04 866 2498 
+3
source

Assuming you got the above data in df data frame:

 sum(df$Sample[EndDate >= df$StartDate & StartDate <= df$EndDate]) 

I.e:

  • df$Sample[...] selects the Sample column with the conditions specified in [...]
  • EndDate >= df$StartDate and StartDate <= df$EndDate are taken from your example converted to R conditions, with & between, so that both conditions are true at the same time. Note that there are no i indices in the expression. The way this works in R, the expression is evaluated for each row in the data frame, and the result df$Sample[...] is a vector of values, only the values ​​in which the expression in [...] was true
  • sum is, of course, a built-in function for calculating the sum, of course
+2
source

You can use lapply/sapply from base R for this. x from @ cameron.bracken's post.

 x$sumifs <- sapply(seq_len(nrow(x)), function(i) with(x, sum(Sample[EndDate >= StartDate[i] & StartDate <= EndDate[i]]))) x # StartDate EndDate Sample sumifs #1 2014-01-10 2014-01-24 139 697 #2 2014-01-12 2014-01-26 136 1459 #3 2014-01-19 2014-02-02 422 2358 #4 2014-01-25 2014-02-08 762 3671 #5 2014-01-29 2014-02-12 899 3715 #6 2014-02-05 2014-02-19 850 4159 #7 2014-02-07 2014-02-21 602 4159 #8 2014-02-09 2014-02-23 180 3397 #9 2014-02-18 2014-03-04 866 2498 
+2
source

You can use the 'by' function to get the value. In the data frame, the β€œby” is broken down line by line into data frames multiplied by the values ​​of one or more factors, and the function is applied to each subset in turn.

 x$sumifs <- by(Sample[EndDate >= StartDate[i] & StartDate <= EndDate[i]],sum) 

More details about the function can be found here.

0
source

Source: https://habr.com/ru/post/1206109/


All Articles