I want to predict day-ahead energy consumption using repetitive neural networks (RNNs). But I believe that the required data format (samples, time parameters, functions) for RNN is confusing. Let me explain with an example:
I have power_dataset.csv on Dropbox, which contains energy consumption from June 5 to June 18 at 10 minutes (144 observations per day). Now, to test the performance of RNN with rnn R , I follow these steps
- train model
Mfor use on June 17th using data from June 5-16. - predict usage on June 18th with
Mand updated usage from June 6th to 17th.
My understanding of the RNN data format:
Samples: Number of samples or observations.
timesteps: Number of steps when the pattern repeats. In my case, 144 observations occur every other day, so each subsequent 144 observations make up timestamps. In other words, it determines seasonality.
functions: The number of functions that are one of my cases, that is, the time series of consumption during historical days
Accordingly, my script looks like this:
library(rnn)
df <- read.csv("power_dataset.csv")
train <- df[1:2016,]
test <- df[145:dim(df)[1],]
trainX <- train[1:1872,]$power
trainY <- train[1873:dim(train)[1],]$power
tx <- array(trainX,dim=c(NROW(trainX),144,1))
ty <- array(trainY,dim=c(NROW(trainY),144,1))
model <- trainr(X=tx,Y=ty,learningrate = 0.04, hidden_dim = 10, numepochs = 100)
Error output:
The sample dimension of X is different from the sample dimension of Y.
The error occurs due to incorrect data formatting. How to format data correctly?