If we assume that I have a trained RNN (for example, a language model) and I want to see what it will generate by itself , how should I submit my output back to its input?
I read the following related questions:
It is theoretically clear to me that in the tensor flow we use truncated back propagation, so we need to determine the maximum step that we would like to βtraceβ. In addition, we maintain the dimension for the parties, so if I wanted to train the sine wave, I have to feed the inputs
[None, num_step, 1] .
The following code works:
tf.reset_default_graph() n_samples=100 state_size=5 lstm_cell = tf.nn.rnn_cell.BasicLSTMCell(state_size, forget_bias=1.) def_x = np.sin(np.linspace(0, 10, n_samples))[None, :, None] zero_x = np.zeros(n_samples)[None, :, None] X = tf.placeholder_with_default(zero_x, [None, n_samples, 1]) output, last_states = tf.nn.dynamic_rnn(inputs=X, cell=lstm_cell, dtype=tf.float64) pred = tf.contrib.layers.fully_connected(output, 1, activation_fn=tf.tanh) Y = np.roll(def_x, 1) loss = tf.reduce_sum(tf.pow(pred-Y, 2))/(2*n_samples) opt = tf.train.AdamOptimizer().minimize(loss) sess = tf.InteractiveSession() tf.global_variables_initializer().run()
The size of the LSTM staff can vary, I also experimented with feeding a sine wave to the network and zeros, and in both cases it converged in ~ 500 iterations. Until now, I realized that in this case the graph consists of n_samples number of LSTM cells separating their parameters, and only before me I enter them as a time series. However, when it generates samples, the network clearly depends on its previous output - this means that I cannot immediately load the deployed model. I tried to calculate the state and output at each step:
with tf.variable_scope('sine', reuse=True): X_test = tf.placeholder(tf.float64) X_reshaped = tf.reshape(X_test, [1, -1, 1]) output, last_states = tf.nn.dynamic_rnn(lstm_cell, X_reshaped, dtype=tf.float64) pred = tf.contrib.layers.fully_connected(output, 1, activation_fn=tf.tanh) test_vals = [0.] for i in range(1000): val = pred.eval({X_test:np.array(test_vals)[None, :, None]}) test_vals.append(val)
However, in this model, there seems to be no continuity between LSTM cells. What's going on here?
Do I need to initialize a null array, i.e. 100 time steps, and assign each execution result to an array? Like network feed with this:
run 0: input_feed = [0, 0, 0 ... 0]; res1 = result input_feed = [0, 0, 0 ... 0]; res1 = result
run 1: input_feed = [res1, 0, 0 ... 0]; res2 = result input_feed = [res1, 0, 0 ... 0]; res2 = result
run 1: input_feed = [res1, res2, 0 ... 0]; res3 = result input_feed = [res1, res2, 0 ... 0]; res3 = result
etc...
What if I want to use this trained network to use my own output as my input in the next time step?