How to initialize weight matrices for RNN?
I believe that people use random normal initialization for weight matrices for RNN. Check out the example at the TensorFlow GitHub Repo . Since the laptop is a little long, they have a simple LSTM model where they use tf.truncated_normal to initialize weights and tf.zeros to initialize offsets (although I tried using tf.ones to initialize prejudices before, it seems to work too). I believe that standard deviation is a hyperparameter that you could configure yourself. Initially, weight initialization is important for gradient flow. Although, as far as I know, LSTM itself is designed to handle the problem of the disappearance of the gradient (and clipping the gradient to solve the problem of gradient decomposition), so maybe you do not need to be very careful with setting std_dev in LSTM? I read articles recommending Xavier initialization ( TF API doc for Xavier initializer ) in the context of a neural convolution network. I donβt know if people use it in RNN, but I think you can even try those in RNN if you want to, if that helps.
Now, to follow @Allen's answers, and your next question is left in the comments.
How to control initialization using the scope of a variable?
Using a simple LSTM model in a TensorFlow GitHub python notebook , which I cited as an example.
In particular, if I want to refactor a part of the LSTM code in the figure above using variable scope management, I can code something like the following ...
import tensorflow as tf def initialize_LSTMcell(vocabulary_size, num_nodes, initializer): '''initialize LSTMcell weights and biases, set variables to reuse mode''' gates = ['input_gate', 'forget_gate', 'memory_cell', 'output_gate'] with tf.variable_scope('LSTMcell') as scope: for gate in gates: with tf.variable_scope(gate) as gate_scope: wx = tf.get_variable("wx", [vocabulary_size, num_nodes], initializer) wt = tf.get_variable("wt", [num_nodes, num_nodes], initializer) bi = tf.get_variable("bi", [1, num_nodes, tf.constant_initializer(0.0)]) gate_scope.reuse_variables() #this line can probably be omitted, bz by setting 'LSTMcell' scope variables to 'reuse' as the next line, it'll turn on the reuse mode for all its child scope variables scope.reuse_variables() def get_scope_variables(scope_name, variable_names): '''a helper function to fetch variable based on scope_name and variable_name''' vars = {} with tf.variable_scope(scope_name, reuse=True): for var_name in variable_names var = tf.get_variable(var_name) vars[var_name] = var return vars def LSTMcell(i, o, state): '''a function for performing LSTMcell computation''' gates = ['input_gate', 'forget_gate', 'memory_cell', 'output_gate'] var_names = ['wx', 'wt', 'bi'] gate_comp = {} with tf.variable_scope('LSTMcell', reuse=True): for gate in gates: vars = get_scope_variables(gate, var_names) gate_comp[gate] = tf.matmul(i, vars['wx']) + tf.matmul(o, vars['wt']) + vars['bi'] state = tf.sigmoid(gate_comp['forget_gate']) * state + tf.sigmoid(gate_comp['input_gate']) * tf.tanh(gate_comp['memory_cell']) output = tf.sigmoid(gate_comp['output_gate']) * tf.tanh(state) return output, state
Using refactored code will be something like the following ...
initialize_LSTMcell(volcabulary_size, num_nodes, tf.truncated_normal_initializer(mean=-0.1, stddev=.01))
Even though the refactored code may not look so simple, the use of variable scope control provides encapsulation of the scope and allows flexible variable controls (at least in my opinion).
By pre-training some LSTMs and using their weights in a subsequent model. How to do this without saving all states and restoring the entire model.
Assuming you have a pre-trained model, frozen and loaded, if you want to use their frozen "wx", "wt" and "bi", you can simply find the names of the parent areas and the names of the variables, then select the variables using the same structure in get_scope_variables func.
with tf.variable_scope(scope_name, reuse=True): var = tf.get_variable(var_name)
Here is a link to understanding the scope of variable variables and exchange variables . Hope this is helpful.