I used the VGG 16-Layer Caffe model for captioning images, and I have a few captions on the image. Now I want to generate a sentence from these signatures (words).
I read in the LSTM article that I should remove the SoftMax layer from the training network and provide the 4096 property vector from the layer fc7directly in LSTM.
I am new to LSTM and RNN.
Where to begin? Is there any tutorial showing how to generate a sentence using sequence marking?
source
share