Initializing Tensorflow RNN Weight Matrices

I use bidirectional_rnn with GRUCell , but this is a general question regarding RNN in Tensorflow.

I could not find how to initialize weight matrices (input hidden, hidden hidden). Are they arbitrarily initialized? to zeros? Are they initialized differently for each LSTM I created?

EDIT: Another motivation for this question is to pre-train some LSTMs and use their weights in a subsequent model. Currently, I do not know how to do this without saving all the states and restoring the entire model.

Thanks.

+5
source share
3 answers

How to initialize weight matrices for RNN?

I believe that people use random normal initialization for weight matrices for RNN. Check out the example at the TensorFlow GitHub Repo . Since the laptop is a little long, they have a simple LSTM model where they use tf.truncated_normal to initialize weights and tf.zeros to initialize offsets (although I tried using tf.ones to initialize prejudices before, it seems to work too). I believe that standard deviation is a hyperparameter that you could configure yourself. Initially, weight initialization is important for gradient flow. Although, as far as I know, LSTM itself is designed to handle the problem of the disappearance of the gradient (and clipping the gradient to solve the problem of gradient decomposition), so maybe you do not need to be very careful with setting std_dev in LSTM? I read articles recommending Xavier initialization ( TF API doc for Xavier initializer ) in the context of a neural convolution network. I don’t know if people use it in RNN, but I think you can even try those in RNN if you want to, if that helps.

Now, to follow @Allen's answers, and your next question is left in the comments.

How to control initialization using the scope of a variable?

Using a simple LSTM model in a TensorFlow GitHub python notebook , which I cited as an example. enter image description here In particular, if I want to refactor a part of the LSTM code in the figure above using variable scope management, I can code something like the following ...

 import tensorflow as tf def initialize_LSTMcell(vocabulary_size, num_nodes, initializer): '''initialize LSTMcell weights and biases, set variables to reuse mode''' gates = ['input_gate', 'forget_gate', 'memory_cell', 'output_gate'] with tf.variable_scope('LSTMcell') as scope: for gate in gates: with tf.variable_scope(gate) as gate_scope: wx = tf.get_variable("wx", [vocabulary_size, num_nodes], initializer) wt = tf.get_variable("wt", [num_nodes, num_nodes], initializer) bi = tf.get_variable("bi", [1, num_nodes, tf.constant_initializer(0.0)]) gate_scope.reuse_variables() #this line can probably be omitted, bz by setting 'LSTMcell' scope variables to 'reuse' as the next line, it'll turn on the reuse mode for all its child scope variables scope.reuse_variables() def get_scope_variables(scope_name, variable_names): '''a helper function to fetch variable based on scope_name and variable_name''' vars = {} with tf.variable_scope(scope_name, reuse=True): for var_name in variable_names var = tf.get_variable(var_name) vars[var_name] = var return vars def LSTMcell(i, o, state): '''a function for performing LSTMcell computation''' gates = ['input_gate', 'forget_gate', 'memory_cell', 'output_gate'] var_names = ['wx', 'wt', 'bi'] gate_comp = {} with tf.variable_scope('LSTMcell', reuse=True): for gate in gates: vars = get_scope_variables(gate, var_names) gate_comp[gate] = tf.matmul(i, vars['wx']) + tf.matmul(o, vars['wt']) + vars['bi'] state = tf.sigmoid(gate_comp['forget_gate']) * state + tf.sigmoid(gate_comp['input_gate']) * tf.tanh(gate_comp['memory_cell']) output = tf.sigmoid(gate_comp['output_gate']) * tf.tanh(state) return output, state 

Using refactored code will be something like the following ...

 initialize_LSTMcell(volcabulary_size, num_nodes, tf.truncated_normal_initializer(mean=-0.1, stddev=.01)) #...Doing some computation... LSTMcell(input_tensor, output_tensor, state) 

Even though the refactored code may not look so simple, the use of variable scope control provides encapsulation of the scope and allows flexible variable controls (at least in my opinion).

By pre-training some LSTMs and using their weights in a subsequent model. How to do this without saving all states and restoring the entire model.

Assuming you have a pre-trained model, frozen and loaded, if you want to use their frozen "wx", "wt" and "bi", you can simply find the names of the parent areas and the names of the variables, then select the variables using the same structure in get_scope_variables func.

 with tf.variable_scope(scope_name, reuse=True): var = tf.get_variable(var_name) 

Here is a link to understanding the scope of variable variables and exchange variables . Hope this is helpful.

+6
source

RNN models will create their variables using get_variable, and you can control the initialization by wrapping the code that creates these variables with the scope variable and passing the default initializer to it . If the RNN does not indicate explicitly ( looking at the code , it is not), uniform_unit_scaling_initializer .

You should also be able to share model weights by declaring a second model and passing reuse = True to its scope variable. As long as the namespaces match, the new model will receive the same variables as the first model.

+5
source

A simple way to initialize all kernel weights with some initializer is to leave the initializer in tf.variable_scope() . For instance:

 with tf.variable_scope('rnn', initializer=tf.variance_scaling_initializer()): basic_cell= tf.contrib.rnn.BasicRNNCell(num_units=n_neurons) outputs, state= tf.nn.dynamic_rnn(basic_cell, X, dtype=tf.float32) 
+1
source

Source: https://habr.com/ru/post/1258974/


All Articles