Tensorflow LSTM-Cell Output

(using python)

I have a question about Tensorflow LSTM Implementation. There are currently several implementations in TF, but I use:

cell = tf.contrib.rnn.BasicLSTMCell(n_units) 
  • where n_units is the number of "parallel" LSTM cells.

Then, to get my output, I call:

  rnn_outputs, rnn_states = tf.nn.dynamic_rnn(cell, x, initial_state=initial_state, time_major=False) 
  • where (as time_major=False ) x has the form (batch_size, time_steps, input_length)
  • where batch_size is my batch_size
  • where time_steps is the number of timestamps with which my RNN will pass
  • where input_length is the length of one of my input vectors (a vector filed into the network at one specific time value in one particular batch)

I expect rnn_outputs to have the form (batch_size, time_steps, n_units, input_length) , since I did not specify a different output size. The nn.dynamic_rnn documentation tells me that the output has the form (batch_size, input_length, cell.output_size) . The tf.contrib.rnn.BasicLSTMCell documentation has an output_size property, the default value is n_units (the number of LSTM cells I use).

Thus, each LSTM-Cell only displays a scalar for each given time? I expect it to print the length vector of the input vector. It does not seem to be the way I understand it now, so I'm confused. Can you tell me if this case can, or how I could change it to output an input vector size vector to one lstm cell, maybe?

+6
source share
1 answer

I think the main confusion is related to the terminology of the LSTM cell argument: num_units . Unfortunately, this does not mean, as the name implies, β€œLSTM cell number”, which should be equal to your time steps. They actually correspond to the number of measurements in the latent state (cell state + vector of the latent state). The dynamic_rnn() call returns the form tensor: [batch_size, time_steps, output_size] where,

(Note this) output_size = num_units; if (num_proj = None) in lstm cell
where as, output_size = num_proj; if defined.

Now, as a rule, you will extract the last result of time_step and project it into the size of the output using the mat-mul + biases operation manually or use the num_proj argument in the LSTM cell.
I went through the same confusion and had to look very deep to clear it. Hope this answer clears some of them.

+2
source

Source: https://habr.com/ru/post/1015299/