Train CNN-LSTLM end to end?

A number of works have been carried out (in particular, for signing images) that share the CNN and LSTM architectures for forecasting and generation tasks. However, they all seem to train CNN independently of LSTM. I watched the torch and TensorFlow (with Keras), and could not find a reason why it should not be possible to do end-to-end training (at least from the point of view of architecture design), but there does not seem to be any documentation for such a model .

So can this be done? Does Torch or TensorFlow (or even Theanos or Caffe) support the joint preparation of CNN-LSTM end-to-end neural network? If so, is it as simple as linking CNN output to LSTM input and starting SGD? Or harder?

+5
source share
2 answers

EDIT I'm afraid this answer is not updated at all! Sorry. Perhaps after some serious reading, for example this review, I can return here. If anyone has more specific links or any direct experience, I would love to hear about it. Hooray!


EDIT 2 is one of these works, starting in 2016, reporting on the (consistently improved, but rather empirical and localized) CRNN implementation for classifying music


EDIT 3 : March 7, 2018 Yang LeKun published in FB the following text related to this document

ConvNets work well for sequence modeling. This is a good experimental article from Intel representatives CMU and Vladlen Koltun from Intel, comparing the performance of ConvNets and repeating networks for various tasks. TL DR is that temporary ConvNets with missing convolutions (ร  trous convolutions) also work better or better than LSTM for many sequential tasks.

Many of us have known this for a while. One of the earliest examples is Leon Botto, who used multi-layered temporary ConvNets with temporary subsampling for speech recognition back in the late 1980s. Most recently, the FAIR machine translation team has used closed ConvNets in its FairSeq system. My apprentice Xiang Zhang Alexis Konno at FAIR-Paris and others have successfully used character-based ConvNets. A few years ago, my graduate student Pyotr Mirovsky used convolutional recurrence networks (trained using the target method) to predict time series.

But a good systematic study, like this one, is long overdue.


Firstly, I will try to give an answer from a technical point of view: It's just how simple it is to connect the output from CNN to the LSTM login and work with SGD?

from TensorFlow here. It is worth paying attention to what they call the initial module: here is a 1-minute video, part of the Google Deep Learning Course explains what this means and outlined their way to deal with it. In the description, they extract several โ€œdeepโ€ representations with various convolutional settings, and then simply concatenate the vectors before transferring them to a fully fixed layer.

Following this approach, if your model is parametric (bearing in mind that you have a fixed set of functions at each level), you can simply plot your schedule using methods that you consider significant and easily concatenate their outputs before applying a fully connected network (provided that they derive tensors with the same dimension). In fact, it will be an end-to-end architecture, and fully connected layers at the top ensure that no information is lost.


But that would be parallel, and in your question, you seem to be wondering if both types of layers can be connected in series. Therefore, at this point I would definitely like to discuss the motivation for this issue : thanks to the universal approximation theorem , we know that any small NN that is large enough can study any nonlinear behavior. Thus, the motivation for using deeper models, quoting Mr. Wanhawk into this other video message , is as follows:

There are many good reasons for this: one is the effectiveness of the parameter: you can get much more performance, with fewer parameters, going deeper rather than wider. Another is that the many natural phenomena that may interest you are a hierarchical structure , in which deep models naturally capture .

And yet this other quote from the initial video:

It looks complicated, but the interesting thing is that you can select these parameters so that the total number of parameters in your model is very small, but the model works better.


The thing, in my opinion, is that all successful (documented) deep models are given a structure that is specially adapted to data to increase productivity. And in this context, each structure contributes to some features; for example, CNNs perfectly capture rotation / scaling / translation , and RNNs also include phase / frequency invariants.

Thus, the former are strongly associated with static, spatial data like images, and the latter are associated with โ€œtemporaryโ€ data , such as sound waves. At first glance, if you want to connect CNN to RNN in series, I would ask what data will motivate this division, and what conclusion do you expect from this.


Fortunately, the Fourier transform makes it possible to convert from time to space , and I believe that CNNs are widely used in transformed spectra to identify models and such ( this is a fairly ongoing work with such methods). I assume that since the RNNs are so inconsistent, this may be the preferred way to handle time-relevant data ...

In any case, I would definitely look past decisions around kaggle, maybe someone already tried this and documented the results.

+3
source

CNN-LSTM can be trained end-end using tensor flow

Suppose you have a CNN M model with input X and an LSTM LSTM model. It can be trained end-end

 # here CNN is used to extract meaning features from the input data features = M(X) # CNN features are used as input to LSTM y = LSTM(features) cost = cost_function(ground_truths, y) 

A detailed example showing the final preparation of the CNN-LSTM model for classifying proposals for the imdb dataset can be found at CNN_LSTM-end-end .

0
source

Source: https://habr.com/ru/post/1258357/


All Articles