My task is to predict the sequence of values (t_0, t_1, ... t_{n_post-1}) , given the previous timestamps (t_{-n_pre}, t_{-n_pre+1} ... t_{-1}) with LSTM Keras level.
Keras well supports the following two cases:
n_post == 1 (forecast from several to one)n_post == n_pre (many of many forecasts with the same sequence length)
But not the version where n_post < n_pre .
To illustrate what I need, I built a simple toy example using a sine wave.
Forecast for many projects
With the following model:
model = Sequential() model.add(LSTM(input_dim=1, output_dim=hidden_neurons, return_sequences=False)) model.add(Dense(1)) model.add(Activation('linear')) model.compile(loss='mean_squared_error', optimizer='rmsprop')
forecasts look like this: 
Many model predictions using n_pre == n_post
The network learns to correctly enter the sine wave with n_pre == n_post with this model:
model = Sequential() model.add(LSTM(input_dim=1, output_dim=hidden_neurons, return_sequences=True)) model.add(TimeDistributed(Dense(1))) model.add(Activation('linear')) model.compile(loss='mean_squared_error', optimizer='rmsprop')

Many model predictions using n_post <n_pre
But now suppose my data looks like this: dataX or input: (nb_samples, nb_timesteps, nb_features) -> (1000, 50, 1) dataY or output: (nb_samples, nb_timesteps, nb_features) -> (1000, 10, 1)
After some research, I found a way to manage these input sizes in Keras using this model:
model = Sequential() model.add(LSTM(input_dim=1, output_dim=hidden_neurons, return_sequences=False)) model.add(RepeatVector(10)) model.add(TimeDistributed(Dense(1))) model.add(Activation('linear')) model.compile(loss='mean_squared_error', optimizer='rmsprop')
But the forecasts are really bad: 
Now my questions are:
- How can I build a model with
n_post < n_pre that will not lose information because it has return_sequences=False ? - Using
n_post == n_pre , and then trimming the output (after training) does not work for me, because it will still try to fit into many timestamps, while only the first ones can be predicted using a neural network (the rest are not strongly correlated and distorted result)