The aggregation function in R zoo returns an error

I have been using this line of code daily for a very long time, and something has broken after returning to standard time. I am trying to use aggregate.zoo to rank hourly data in the afternoon. The data period I'm looking at does not include DST time, so I don’t understand what the problem is. I am using the zoo package. Here is the data structure:

 require(zoo) structure(c(15.52, 14.56, 14.31, 14.17, 13.75, 15.3, 25.57, 25.39, 23.43, 22.92, 23.31, 23.44, 22.09, 21.28, 21, 20.94, 27.16, 32.73, 33.74, 29.12, 24.78, 21.44, 18.95, 17.08, 17.9, 17.54, 16.45, 16.59, 16.09, 17.23, 25.31, 25.43, 24.93, 24.47, 23.69, 21.53, 19.61, 19.53, 19.38, 19.38, 24.59, 29.03, 30.02, 23.78, 20.44, 21.39, 18.79, 21.7, 20.5, 20.63, 18.57, 19.41, 19.2, 15.23, 23.48, 24.89, 24.79, 24.01, 23.18, 22.5, 20.88, 21.12, 20, 20.55, 27.83, 31.21, 28.29, 26.1, 23.31, 21.64, 18.19, 17.28, 17.87, 17.63, 16.48, 17.27, 17.04, 17.79, 19.78, 20.92, 23.53, 25.45, 24.37, 21.89, 21.22, 21.03, 19.92, 20.14, 24.9, 27.88, 28.54, 29.14, 24.93, 19.24, 21.75, 18.66, 19.21, 18.49, 17.08, 15.92, 15.51, 15.5, 16.1, 16.61, 18.54, 20.15, 20.36, 20.32, 19.94, 19.13, 18.58, 18.71, 21.1, 26.77, 28.98, 27.12, 24.71, 20.83, 18.2, 16.68, 17.56, 16.86, 15.79, 15.47, 15.49, 17.66, 24.05, 24.97, 25.1, 25.74, 25.32, 24.98, 24.13, 24.05, 23.45, 23.84, 26.16, 31.2, 32.57, 30.19, 26.1, 22, 18.9, 17.85, 17.7, 16.53, 15.32, 14.77, 15.06, 17.54, 24.94, 26.06, 25.52, 25.85, 25.53, 24.97, 24.17, 24.19, 23.55, 23.78, 26.35, 31.49, 33.06, 29.9, 25.52, 21.62, 18.77, 17.77, 17.87, 16.75, 15.38, 14.69, 15, 17.84, 25.05, 25.5, 24.92, 25.46, 25.03, 24.81, 24.22, 23.54, 22.95, 22.9, 25.82, 30.87, 32.49, 29.17, 25.13, 21.56, 18.58, 17.67, 17.06, 15.96, 14.45, 13.93, 14.27, 16.64, 23.29, 24.17, 24.07, 24.22, 24.06, 24.06, 23.24, 23.05, 22.39, 22.54, 25.07, 29.89, 31.53, 28.24, 24.28, 20.78, 18.09, 17.02), index = structure(c(1478671200, 1478674800, 1478678400, 1478682000, 1478685600, 1478689200, 1478692800, 1478696400, 1478700000, 1478703600, 1478707200, 1478710800, 1478714400, 1478718000, 1478721600, 1478725200, 1478728800, 1478732400, 1478736000, 1478739600, 1478743200, 1478746800, 1478750400, 1478754000, 1478757600, 1478761200, 1478764800, 1478768400, 1478772000, 1478775600, 1478779200, 1478782800, 1478786400, 1478790000, 1478793600, 1478797200, 1478800800, 1478804400, 1478808000, 1478811600, 1478815200, 1478818800, 1478822400, 1478826000, 1478829600, 1478833200, 1478836800, 1478840400, 1478844000, 1478847600, 1478851200, 1478854800, 1478858400, 1478862000, 1478865600, 1478869200, 1478872800, 1478876400, 1478880000, 1478883600, 1478887200, 1478890800, 1478894400, 1478898000, 1478901600, 1478905200, 1478908800, 1478912400, 1478916000, 1478919600, 1478923200, 1478926800, 1478930400, 1478934000, 1478937600, 1478941200, 1478944800, 1478948400, 1478952000, 1478955600, 1478959200, 1478962800, 1478966400, 1478970000, 1478973600, 1478977200, 1478980800, 1478984400, 1478988000, 1478991600, 1478995200, 1478998800, 1479002400, 1479006000, 1479009600, 1479013200, 1479016800, 1479020400, 1479024000, 1479027600, 1479031200, 1479034800, 1479038400, 1479042000, 1479045600, 1479049200, 1479052800, 1479056400, 1479060000, 1479063600, 1479067200, 1479070800, 1479074400, 1479078000, 1479081600, 1479085200, 1479088800, 1479092400, 1479096000, 1479099600, 1479103200, 1479106800, 1479110400, 1479114000, 1479117600, 1479121200, 1479124800, 1479128400, 1479132000, 1479135600, 1479139200, 1479142800, 1479146400, 1479150000, 1479153600, 1479157200, 1479160800, 1479164400, 1479168000, 1479171600, 1479175200, 1479178800, 1479182400, 1479186000, 1479189600, 1479193200, 1479196800, 1479200400, 1479204000, 1479207600, 1479211200, 1479214800, 1479218400, 1479222000, 1479225600, 1479229200, 1479232800, 1479236400, 1479240000, 1479243600, 1479247200, 1479250800, 1479254400, 1479258000, 1479261600, 1479265200, 1479268800, 1479272400, 1479276000, 1479279600, 1479283200, 1479286800, 1479290400, 1479294000, 1479297600, 1479301200, 1479304800, 1479308400, 1479312000, 1479315600, 1479319200, 1479322800, 1479326400, 1479330000, 1479333600, 1479337200, 1479340800, 1479344400, 1479348000, 1479351600, 1479355200, 1479358800, 1479362400, 1479366000, 1479369600, 1479373200, 1479376800, 1479380400, 1479384000, 1479387600, 1479391200, 1479394800, 1479398400, 1479402000, 1479405600, 1479409200, 1479412800, 1479416400, 1479420000, 1479423600, 1479427200, 1479430800, 1479434400, 1479438000, 1479441600, 1479445200), class = c("POSIXct", "POSIXt"), tzone = "America/New_York"), class = "zoo") 

All I am trying to do is to rank the data by day, so the line I have been using successfully for a long time is:

 ordered.price = aggregate(z, as.Date(index(z), tz='EST'), FUN=rank, ties.method = "first", na.last = FALSE) 

When doing this, I get this error:

 Error in zoo(df, ix[!is.na(ix)]) : "x" : attempt to define invalid zoo object 

I got lost here, especially since the code functioned properly for so long. I am using R 3.2.3. Any help is really appreciated.

+5
source share
1 answer

Below we present some approaches that allow us to avoid mistakes. (1) It is unlikely that you want, but it will give us some idea before we move on to (2) and (3).

1) padding . Different numbers of elements returned by FUN for different dates, therefore, cannot create a rectangular result.

 lens <- aggregate(z,as.Date(index(z),tz='EST'), FUN = length) 

giving:

 > lens 2016-11-09 2016-11-10 2016-11-11 2016-11-12 2016-11-13 2016-11-14 2016-11-15 23 24 24 24 24 24 24 2016-11-16 2016-11-17 2016-11-18 24 24 1 

Indentation to length 24 will allow it not to give an error:

 aggregate(z,as.Date(index(z),tz='EST'), FUN = function(x) replace(rep(NA, max(lens)), seq_along(x), rank(x, ties="first", na.last=F))) 

giving:

 2016-11-09 6 4 3 2 1 5 19 18 15 13 14 16 12 10 9 8 20 22 23 21 17 11 7 NA 2016-11-10 4 7 6 2 3 1 5 21 22 20 18 16 15 12 11 9 10 19 23 24 17 13 14 8 2016-11-11 13 7 9 3 5 4 1 17 20 19 18 15 14 10 11 6 8 22 24 23 21 16 12 2 2016-11-12 4 7 5 1 3 2 6 9 12 17 21 18 16 14 13 10 11 19 22 23 24 20 8 15 2016-11-13 11 14 8 6 3 2 1 4 5 9 16 18 17 15 13 10 12 20 22 24 23 21 19 7 2016-11-14 4 6 5 3 1 2 7 12 15 17 19 18 16 14 13 10 11 21 23 24 22 20 9 8 2016-11-15 7 6 4 3 1 2 5 14 20 16 19 18 15 12 13 10 11 21 23 24 22 17 9 8 2016-11-16 5 7 4 3 1 2 6 17 20 15 19 16 14 13 12 11 10 21 23 24 22 18 9 8 2016-11-17 7 6 4 3 1 2 5 14 18 17 19 15 16 13 12 10 11 21 23 24 22 20 9 8 2016-11-18 1 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA 

Although filling out avoids the mistake, it is unlikely that you really want to.

2) time zone It is likely that alignment with dates is disabled, so it produces 23 elements for the first date and 1 element for the last. If you fix it so that there are 24 elements on the date, then it will work without filling.

Using this value for tz seems to give 24 elements per day. (Running OlsonNames() contains a list of time zones.)

 table(as.Date(time(z), tz = "Etc/GMT+6")) 

giving:

 2016-11-09 2016-11-10 2016-11-11 2016-11-12 2016-11-13 2016-11-14 2016-11-15 24 24 24 24 24 24 24 2016-11-16 2016-11-17 24 24 

Thus, we can replace tz="EST" in the question with tz="Etc/GMT=6" , indicating this code:

 aggregate(z, as.Date(index(z), tz='Etc/GMT+6'), FUN=rank, ties.method = "first", na.last = FALSE) 

that leads to:

 2016-11-09 6 4 3 2 1 5 20 19 16 14 15 17 13 11 10 9 21 23 24 22 18 12 8 7 2016-11-10 6 5 2 3 1 4 21 22 20 18 16 14 11 10 8 9 19 23 24 17 12 13 7 15 2016-11-11 8 10 4 6 5 1 17 20 19 18 15 14 11 12 7 9 22 24 23 21 16 13 3 2 2016-11-12 6 4 1 3 2 5 9 12 17 21 18 16 14 13 10 11 19 22 23 24 20 8 15 7 2016-11-13 14 9 7 3 2 1 4 5 10 16 18 17 15 13 11 12 20 22 24 23 21 19 8 6 2016-11-14 5 4 3 1 2 6 12 15 17 19 18 16 14 13 10 11 21 23 24 22 20 9 8 7 2016-11-15 6 4 3 1 2 5 14 20 16 19 18 15 12 13 10 11 21 23 24 22 17 9 8 7 2016-11-16 7 4 3 1 2 6 17 20 15 19 16 14 13 12 11 10 21 23 24 22 18 9 8 5 2016-11-17 7 4 3 1 2 5 14 18 17 19 15 16 13 12 10 11 21 23 24 22 20 9 8 6 

Although this gives the correct answer depending on the time zone, such as this may not be the best result.

3) Noon Another way to approach this is if we know that (i) we are only interested in date / time in the New York and GMT time zones and (ii) the data starts at 1 a.m. and has consecutive hours to use the dates at positions 12, 12 + 24, 12 + 48, etc., since noon will be on the same day in either of the two above-mentioned time zones.

 aggregate(z, as.Date(index(z)[rep(seq(12, length(z), 24), each = 24)]), FUN=rank, ties.method = "first", na.last = FALSE) 

4) Subtract 1 second . The poster noted that the problem is that his watch represents the end of a one-hour event, so the first 24 dates in z are

 > head(time(z), 24) c("2016-11-09 01:00:00 EST", "2016-11-09 02:00:00 EST", ..., "2016-11-09 23:00:00 EST", "2016-11-10 00:00:00 EST") 

which have hours 01, 02, ..., 23, 00, and therefore the last hour the next day. If we subtract one second from each, we get 00, 01, 02, ..., 22, 23, so the clock is now all in one day:

 > head(time(z)-1, 24) c("2016-11-09 00:59:59 EST", "2016-11-09 01:59:59 EST", ..., "2016-11-09 22:59:59 EST", "2016-11-09 23:59:59 EST") 

So we can use:

 aggregate(z, as.Date(index(z)-1, tz = "EST"), FUN=rank, ties.method = "first", na.last = FALSE) 

Subtracting one second, times still reasonably relate to the end of each time period, and we get the correct dates. We could subtract one hour instead of one second, as suggested by the poster in his comment below, in which case the time will represent the beginning of the hour, not the end, as he notes.

Note: Also consider whether the "EST5EDT" time zone "EST5EDT" for your problem.

+7
source

Source: https://habr.com/ru/post/1259604/


All Articles