Keras sequence length change without padding

I have a question regarding different sequence lengths for LSTM in Keras. I skip batches of size 200 and sequences of variable length (= x) with 100 functions for each object in the sequence (=> [200, x, 100]) in LSTM:

LSTM(100, return_sequences=True, stateful=True, input_shape=(None, 100), batch_input_shape=(200, None, 100)) 

I fit the model for the following randomly generated matrices:

 x_train = np.random.random((1000, 50, 100)) x_train_2 = np.random.random((1000, 10,100)) 

As I understand LSTM (and the Keras implementation) correctly, x should be among the LSTM cells. For each LSTM cell, it is necessary to study the state and three matrices (for input, state, and output of the cell). How can I transfer different lengths to LSTM without filling up to max. specified length, how am I? The code works, but actually it should not (in my understanding). It is even possible to pass another x_train_3 with a sequence length of 60 after that, but there should be no states and matrices for an additional 10 cells.

By the way, I am using Keras version 1.0.8 and Tensorflow GPU 0.9.

Here is my sample code:

 from keras.models import Sequential from keras.layers import LSTM, Dense import numpy as np from keras import backend as K with K.get_session(): # create model model = Sequential() model.add(LSTM(100, return_sequences=True, stateful=True, input_shape=(None, 100), batch_input_shape=(200, None, 100))) model.add(LSTM(100)) model.add(Dense(2, activation='softmax')) model.compile(loss='categorical_crossentropy', optimizer='rmsprop', metrics=['accuracy']) # Generate dummy training data x_train = np.random.random((1000, 50, 100)) x_train_2 = np.random.random((1000, 10, 100)) y_train = np.random.random((1000, 2)) y_train_2 = np.random.random((1000, 2)) # Generate dummy validation data x_val = np.random.random((200, 50, 100)) y_val = np.random.random((200, 2)) # fit and eval models model.fit(x_train, y_train, batch_size=200, nb_epoch=1, shuffle=False, validation_data=(x_val, y_val), verbose=1) model.fit(x_train_2, y_train_2, batch_size=200, nb_epoch=1, shuffle=False, validation_data=(x_val, y_val), verbose=1) score = model.evaluate(x_val, y_val, batch_size=200, verbose=1) 
+5
source share
1 answer

First: you do not need stateful=True and batch_input . They are designed to divide very long sequences into parts and train each part separately without a model, thinking that the sequence has come to an end.

When you use state layers, you must reset / erase the state / memory manually when you decide that a particular batch is the last part of a long sequence.

You seem to be working with all sequences. No state is required.

Filling is not strictly necessary, but it looks like you can use padding + masking to ignore extra steps. If you do not want to use indentation, you can separate your data in smaller batches, each batch with a specific sequence length. See this: stackoverflow.com/questions/46144191

The length of the sequence (time steps) does not change the number of cells / units or weights. You can train using different lengths. A size that cannot be changed is the number of functions.


Input Dimensions:

Input sizes are (NumberOfSequences, Length, Features) .
There is absolutely no connection between the input form and the number of cells. It contains only the number of steps or recursions, which is the Length dimension.

<strong> Cells:

Cells in LSTM layers behave in the same way as “units” in dense layers.

A cell is not a step. A cell is just the number of "parallel" operations. Each group of cells performs together repeating operations and steps.

There is a conversation between the cells, as @ Yu-Yang is well noted in the comments. But the idea that they are one and the same entity carried through steps remains valid.

These little blocks that you see in images like this are not cells, these are steps.

Variable lengths:

However, the length of your sequences does not affect the entire number of parameters (matrices) in the LSTM layer. It just affects the number of steps.

The fixed number of matrices within a layer will be recalculated more time for long sequences and less time for short sequences. But in all cases, this matrix receives updates and proceeds to the next step.

The length of the sequence varies only by the number of updates.

Layer Definition:

The number of cells can be any number in general, it simply determines how many parallel mini-brains will work together (this means a more or less powerful network and more or less output functions).

 LSTM(units=78) #will work perfectly well, and will output 78 "features". #although it will be less intelligent than one with 100 units, outputting 100 features. 

There is a unique weight matrix and a unique state / memory matrix that continues to move on to the next steps. These matrices are simply “updated” at each step, but there is no one matrix for each step.

Examples of images:

enter image description here

Each block "A" is a step in which the same group of matrices (states, weights, ...) is used and updated.

There are no 4 cells, but the same cell performs 4 updates, one update for each input.

Each X1, X2, ... is one piece of your sequence in a length dimension.


enter image description here

Longer sequences will reuse and update matrices more times than shorter sequences. But this is another cell.


enter image description here

The number of cells does affect the size of the matrices, but does not depend on the length of the sequence. All cells will work in parallel, with some conversation between them.


Your model

In your model, you can create LSTM layers as follows:

 model.add(LSTM(anyNumber, return_sequences=True, input_shape=(None, 100))) model.add(LSTM(anyOtherNumber)) 

Using None in input_shape , you are already telling your model that it accepts sequences of any length.

All you have to do is train. And your training code is fine. The only thing that is not allowed is to create a package with a different length inside. So, as you did, create a package for each length and prepare each batch.

+2
source

Source: https://habr.com/ru/post/1269415/


All Articles