I am doing an analysis of the hourly rainfall in a file that is disorganized. However, I managed to clear it and save it in a data frame (called CA1), which takes the form as follows:
Station_ID Guage_Type Lat Long Date Time_Zone Time_Frame H0 H1 H2 H3 H4 H5 H6 H7 H8 H9 H10 H11 H12 H13 H14 H15 H16 H17 H18 H19 H20 H21 H22 H23 1 4457700 HI 41.52 124.03 1948-07-01 8 LST 0 0 0 0 0 0 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0 0 0 0 0 0 0 0 0 0 0 0 2 4457700 HI 41.52 124.03 1948-07-05 8 LST 0 1 1 1 1 1 2.0000000 2.0000000 2.0000000 4.0000000 5.0000000 5.0000000 4 7 1 1 0 0 10 13 5 1 1 3 3 4457700 HI 41.52 124.03 1948-07-06 8 LST 1 1 1 0 1 1 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0 0 0 0 0 0 0 0 0 0 0 0 4 4457700 HI 41.52 124.03 1948-07-27 8 LST 3 0 0 0 0 0 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0 0 0 0 0 0 0 0 0 0 0 0 5 4457700 HI 41.52 124.03 1948-08-01 8 LST 0 0 0 0 0 0 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0 0 0 0 0 0 0 0 0 0 0 0 6 4457700 HI 41.52 124.03 1948-08-17 8 LST 0 0 0 0 0 0 0.3888889 0.3888889 0.3888889 0.3888889 0.3888889 0.3888889 6 1 0 0 0 0 0 0 0 0 0 0
Where H0 - H23 - 24 hours a day (line)
Using only CA1 (the data frame above), I take every day (row) of 24 points and transpose it vertically and combine the remaining days (rows) with one variable, which I call dat1:
> dat1[1:48,] H0 H1 H2 H3 H4 H5 H6 H7 H8 H9 H10 H11 H12 H13 H14 H15 H16 H17 H18 H19 H20 H21 H22 H23 H0 H1 H2 H3 H4 H5 H6 H7 H8 H9 H10 H11 H12 H13 H14 H15 H16 H17 H18 H19 H20 H21 H22 H23 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 2 2 2 4 5 5 4 7 1 1 0 0 10 13 5 1 1 3
Using the variable dat1, I enter it as an argument to get the time series data:
> rainCA1 <- ts(dat1, start = c(1900+as.POSIXlt(CA1[1,5])$year, 1+as.POSIXlt(CA1[1,5])$mon), frequency = 24)
A few notes:
>dim(CA1) [1] 5636 31 >length(dat1) [1] 135264
Thus, 5636 * 24 (common data points [24] per row) = 135264 points. The length (rainCA1) is consistent with the above points. However, if I put an end to the ts function, for example
>rainCA1 <- ts(dat1, start = c(1900+as.POSIXlt(CA1[1,5])$year, 1+as.POSIXlt(CA1[1,5])$mon), end = c(1900+as.POSIXlt(CA1[5636,5])$year, 1+as.POSIXlt(CA1[5636,5])$mon), frequency = 24)
I get 1134 of the total length of the points, where I miss a lot of data. I assume this is due to the fact that the dates are not consecutive, and since I only use the month and year as an argument to the starting point.
Continuing, in my opinion, the correct path, using the first ts calculation without end argument, I will put it as input for stl:
>rainCA1_2 <-stl(rainCA1, "periodic")
Sorry, I get an error:
Error in stl(rainCA1, "periodic") : only univariate series are allowed
What I donβt understand and how to do it. However, if I go back to the ts function and provide the final argument, stl works fine without any errors.
I have researched in many forums, but not one (or as far as I know) gives a good solution for getting hourly data attribute data. If anyone can help me, I will really appreciate it. Thanks!