This question exists as a github issue . I would like to build a neural network in Keras, which contains both 2D convolutions and the LSTM layer.
The network must classify MNIST. The training data in MNIST is 60,000 images with a gray scale from handwritten digits from 0 to 9. Each image has a size of 28x28 pixels.
I divided the images into four parts (left / right, up / down) and rearranged them in four orders to get sequences for LSTM.
| | |1 | 2| |image| -> ------- -> 4 sequences: |1|2|3|4|, |4|3|2|1|, |1|3|2|4|, |4|2|3|1| | | |3 | 4|
One of the small sub-images has a size of 14 x 14. Four sequences are stacked together in width (it does not matter whether the width or height).
This creates a vector with the shape [60000, 4, 1, 56, 14], where:
- 60,000 - number of samples
- 4 - the number of elements in the sequence (the number of time stamps)
- 1 - color depth (shades of gray)
- 56 and 14 - width and height
Now this should be given to the Keras model. The problem is resizing input between CNN and LSTM. I searched the web and found this question: Python keras, how to resize input after convolution layer to lstm layer
The solution is similar to a Reshape layer, which aligns the image, but retains timestamps (unlike the Flatten layer, which destroys everything except batch_size).
Here is my code:
nb_filters=32 kernel_size=(3,3) pool_size=(2,2) nb_classes=10 batch_size=64 model=Sequential() model.add(Convolution2D(nb_filters, kernel_size[0], kernel_size[1], border_mode="valid", input_shape=[1,56,14])) model.add(Activation("relu")) model.add(Convolution2D(nb_filters, kernel_size[0], kernel_size[1])) model.add(Activation("relu")) model.add(MaxPooling2D(pool_size=pool_size)) model.add(Reshape((56*14,))) model.add(Dropout(0.25)) model.add(LSTM(5)) model.add(Dense(50)) model.add(Dense(nb_classes)) model.add(Activation("softmax"))
This code generates an error message:
ValueError: the total size of the new array must be unchanged
Apparently, the entrance to the Reshape layer is incorrect. As an alternative, I also tried passing timestamps to the Reshape layer:
model.add(Reshape((4,56*14)))
This does not seem correct, and in any case, the error remains the same.
Am I doing it right? Is Reshape a suitable tool for connecting CNN and LSTM?
There are quite complex approaches to this problem. For example: https://github.com/fchollet/keras/pull/1456 TimeDistributed Layer, which seems to hide the timestep dimension from the following layers.
Or this: https://github.com/anayebi/keras-extra A set of special layers for combining CNN and LSTM.
Why are such difficult (at least, it seems to me difficult) decisions if simple Reshape does the trick?
UPDATE
Embarrassingly, I forgot that the sizes will be resized by the pool and (due to the lack of laying) the bundle. kgrm advised me to use model.summary()
to check the sizes.
The output of the layer in front of the Reshape layer is (None, 32, 26, 5)
. I changed the shape for the change: model.add(Reshape((32*26*5,)))
.
Now the ValueError value has disappeared, instead the LSTM complains:
Exception: input 0 is not compatible with the lstm_5 layer: expected ndim = 3, found ndim = 2
It seems that I need to pass the measurement of time throughout the network. How can i do this? If I add it to input_shape of Convolution, he also complains: Convolution2D(nb_filters, kernel_size[0], kernel_size[1], border_mode="valid", input_shape=[4, 1, 56,14])
Exception: input 0 is incompatible with layer convolution 2D_44: expected ndim = 4, found ndim = 5