Im looking for a way to use a search algorithm on a data frame that for a given element checks the corresponding variables within the range and returns the maximum such variable. The general meaning is that I want the function (1) to consider a given element, (2) find all other elements with the same name, (3) among all elements with the same name, see if the corresponding variable matches inside + - X of any others, and (4) if so, return the maximum of them; if not, just return any variable.
A specific example is with timestamp data. Let's say I have orders for 2 enterprises that are classified by date, hour and minute. I want to see daily orders, but the problem is that if orders go within 2 minutes of each other, they are counted twice, so I only want to see the maximum value in such cases.
* EDIT: I have to say that if orders are registered sequentially for a couple of minutes from each other, we assume that they are duplicated and only want the maximum value. So, if there were 4 orders for every minute, but then there were no other orders +2 minutes from the last and -2 from the first, we can assume that a group of 4 should be counted only once, and this should be the maximum value, which counted
Here are some details:
data <- structure(list(date = structure(c(16090, 16090, 16090, 16090,
16090, 16090, 16090, 16090, 16090, 16090, 16090, 16090, 16091,
16091, 16091, 16091, 16091, 16091, 16091), class = "Date"), company = structure(c(1L,
1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L,
1L, 1L), .Label = c("ABCo", "Zyco"), class = "factor"), hour = c(5L,
5L, 5L, 7L, 7L, 5L, 5L, 6L, 6L, 7L, 7L, 8L, 6L, 6L, 6L, 7L, 7L,
7L, 8L), minute = c(21L, 22L, 50L, 13L, 20L, 34L, 47L, 34L, 35L,
20L, 44L, 19L, 14L, 16L, 37L, 24L, 26L, 49L, 50L), orders = c(59L,
46L, 31L, 15L, 86L, 23L, 8L, 71L, 86L, 44L, 23L, 47L, 6L, 53L,
21L, 54L, 73L, 63L, 4L)), .Names = c("date", "company", "hour",
"minute", "orders"), row.names = c(NA, -19L), class = "data.frame")
, , - , + - 2 , "", + - 2 , "". ( "", ABCo 2014-01-20 = 5, 21 22 + -2 , , 59. , ABCo 1-20 = 5 = 50, + -2, , 31)
+ + :
data$biztime <- do.call(paste, c(data[c("company","date","hour")], sep = "_"))
data2 <- ddply(data, .(biztime, minute), summarise, orders = sum(orders))
. ifelse - , ?