R + Search algorithm to match values ​​in a range of identical elements

Im looking for a way to use a search algorithm on a data frame that for a given element checks the corresponding variables within the range and returns the maximum such variable. The general meaning is that I want the function (1) to consider a given element, (2) find all other elements with the same name, (3) among all elements with the same name, see if the corresponding variable matches inside + - X of any others, and (4) if so, return the maximum of them; if not, just return any variable.

A specific example is with timestamp data. Let's say I have orders for 2 enterprises that are classified by date, hour and minute. I want to see daily orders, but the problem is that if orders go within 2 minutes of each other, they are counted twice, so I only want to see the maximum value in such cases.

* EDIT: I have to say that if orders are registered sequentially for a couple of minutes from each other, we assume that they are duplicated and only want the maximum value. So, if there were 4 orders for every minute, but then there were no other orders +2 minutes from the last and -2 from the first, we can assume that a group of 4 should be counted only once, and this should be the maximum value, which counted

Here are some details:

data <- structure(list(date = structure(c(16090, 16090, 16090, 16090, 
16090, 16090, 16090, 16090, 16090, 16090, 16090, 16090, 16091, 
16091, 16091, 16091, 16091, 16091, 16091), class = "Date"), company = structure(c(1L, 
1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L), .Label = c("ABCo", "Zyco"), class = "factor"), hour = c(5L, 
5L, 5L, 7L, 7L, 5L, 5L, 6L, 6L, 7L, 7L, 8L, 6L, 6L, 6L, 7L, 7L, 
7L, 8L), minute = c(21L, 22L, 50L, 13L, 20L, 34L, 47L, 34L, 35L, 
20L, 44L, 19L, 14L, 16L, 37L, 24L, 26L, 49L, 50L), orders = c(59L, 
46L, 31L, 15L, 86L, 23L, 8L, 71L, 86L, 44L, 23L, 47L, 6L, 53L, 
21L, 54L, 73L, 63L, 4L)), .Names = c("date", "company", "hour", 
"minute", "orders"), row.names = c(NA, -19L), class = "data.frame")

, , - , + - 2 , "", + - 2 , "". ( "", ABCo 2014-01-20 = 5, 21 22 + -2 , , 59. , ABCo 1-20 = 5 = 50, + -2, , 31)

+ + :

data$biztime <- do.call(paste, c(data[c("company","date","hour")], sep = "_"))

data2 <- ddply(data, .(biztime, minute), summarise, orders = sum(orders))

. ifelse - , ?

+4
2

datetime:

data <- transform(data,
   datetime = strptime(sprintf("%s %s:%s", date, hour, minute),
                       format = "%Y-%m-%d %H:%M"))

, :

data <- ddply(data, .(company), transform, timegroup =
              cumsum(c(TRUE, diff(datetime, units = "mins") > 2)))

, :

ddply(data, .(company, timegroup), summarise,
      orders = max(orders),
      datetime = datetime[1])

#    company timegroup orders            datetime
# 1     ABCo         1     59 2014-01-20 05:21:00
# 2     ABCo         2     31 2014-01-20 05:50:00
# 3     ABCo         3     15 2014-01-20 07:13:00
# 4     ABCo         4     86 2014-01-20 07:20:00
# 5     ABCo         5     53 2014-01-21 06:14:00
# 6     ABCo         6     21 2014-01-21 06:37:00
# 7     ABCo         7     73 2014-01-21 07:24:00
# 8     ABCo         8     63 2014-01-21 07:49:00
# 9     ABCo         9      4 2014-01-21 08:50:00
# 10    Zyco         1     23 2014-01-20 05:34:00
# 11    Zyco         2      8 2014-01-20 05:47:00
# 12    Zyco         3     86 2014-01-20 06:34:00
# 13    Zyco         4     44 2014-01-20 07:20:00
# 14    Zyco         5     23 2014-01-20 07:44:00
# 15    Zyco         6     47 2014-01-20 08:19:00
+4

- , , ; , , .

data$gr = as.numeric(interaction(data$company, data$date, data$hour))

ff = function(mins, ords) {
 unlist(lapply(mins, function(x) max(ords[abs(x - mins) <= 2])))
}

do.call(rbind, 
             lapply(split(data, data$gr), 
                        function(x) transform(x, new_val = ff(x$minute, x$orders))))

#            date company hour minute orders gr new_val
#1.1   2014-01-20    ABCo    5     21     59  1      59
#1.2   2014-01-20    ABCo    5     22     46  1      59
#1.3   2014-01-20    ABCo    5     50     31  1      31
#2.6   2014-01-20    Zyco    5     34     23  2      23
#2.7   2014-01-20    Zyco    5     47      8  2       8
#6.8   2014-01-20    Zyco    6     34     71  6      86
#6.9   2014-01-20    Zyco    6     35     86  6      86
#7.13  2014-01-21    ABCo    6     14      6  7      53
#7.14  2014-01-21    ABCo    6     16     53  7      53
#7.15  2014-01-21    ABCo    6     37     21  7      21
#9.4   2014-01-20    ABCo    7     13     15  9      15
#9.5   2014-01-20    ABCo    7     20     86  9      86
#10.10 2014-01-20    Zyco    7     20     44 10      44
#10.11 2014-01-20    Zyco    7     44     23 10      23
#11.16 2014-01-21    ABCo    7     24     54 11      73
#11.17 2014-01-21    ABCo    7     26     73 11      73
#11.18 2014-01-21    ABCo    7     49     63 11      63
#14    2014-01-20    Zyco    8     19     47 14      47
#15    2014-01-21    ABCo    8     50      4 15       4
0

Source: https://habr.com/ru/post/1524886/


All Articles