Why is LayerNormBasicLSTMCell so much slower and less accurate than LSTMCell?

Question

Why is LayerNormBasicLSTMCell so much slower and less accurate than LSTMCell?

I recently discovered that LayerNormBasicLSTMCell is a version of LSTM with level normalization and clipping. So I replaced the source code of LSTMCell with LayerNormBasicLSTMCell. Not only did this change reduce the accuracy of the test from ~ 96% to ~ 92%, it took much more time (~ 33 hours) to train (the initial training time was ~ 6 hours). All parameters are the same: the number of eras (10), the number of layers stacked (3), the number of hidden vector sizes (250), drop out keep prob (0.5), ... The hardware is also the same.

My question is: what did I do wrong here?

My original model (using LSTMCell):

# Batch normalization of the raw input
tf_b_VCCs_AMs_BN1 = tf.layers.batch_normalization(
    tf_b_VCCs_AMs, # the input vector, size [#batches, #time_steps, 2]
    axis=-1, # axis that should be normalized 
    training=Flg_training, # Flg_training = True during training, and False during test
    trainable=True,
    name="Inputs_BN"
    )

# Bidirectional dynamic stacked LSTM

##### The part I changed in the new model (start) #####
dropcells = []
for iiLyr in range(3):
    cell_iiLyr = tf.nn.rnn_cell.LSTMCell(num_units=250, state_is_tuple=True)
    dropcells.append(tf.nn.rnn_cell.DropoutWrapper(cell=cell_iiLyr, output_keep_prob=0.5))
##### The part I changed in the new model (end) #####

MultiLyr_cell = tf.nn.rnn_cell.MultiRNNCell(cells=dropcells, state_is_tuple=True)

outputs, states  = tf.nn.bidirectional_dynamic_rnn(
    cell_fw=MultiLyr_cell, 
    cell_bw=MultiLyr_cell,
    dtype=tf.float32,
    sequence_length=tf_b_lens, # the actual lengths of the input sequences (tf_b_VCCs_AMs_BN1)
    inputs=tf_b_VCCs_AMs_BN1,
    scope = "BiLSTM"
    )

My new model (using LayerNormBasicLSTMCell):

...
dropcells = []
for iiLyr in range(3):
    cell_iiLyr = tf.contrib.rnn.LayerNormBasicLSTMCell(
        num_units=250,
        forget_bias=1.0,
        activation=tf.tanh,
        layer_norm=True,
        norm_gain=1.0,
        norm_shift=0.0,
        dropout_keep_prob=0.5
        )
    dropcells.append(cell_iiLyr)
...

+4

tensorflow normalization lstm

Maosi Chen 17 . '17 17:17

3

carusyte · Answer 1 · 2018-02-27T02:07:31+0000

, dropout_keep_prob . 0.5 1.0 . .

Fariborz Ghavamian · Answer 2 · 2017-12-22T08:23:10+0000

: : http://olavnymoen.com/2016/07/07/rnn-batch-normalization. . . lstm 3 , lstm. , .

: .

ehrencrona · Answer 3 · 2018-04-20T08:49:13+0000

, . , .

, .

, layer_norm, , , , .

Why is LayerNormBasicLSTMCell so much slower and less accurate than LSTMCell?

More articles: