I recently discovered that LayerNormBasicLSTMCell is a version of LSTM with level normalization and clipping. So I replaced the source code of LSTMCell with LayerNormBasicLSTMCell. Not only did this change reduce the accuracy of the test from ~ 96% to ~ 92%, it took much more time (~ 33 hours) to train (the initial training time was ~ 6 hours). All parameters are the same: the number of eras (10), the number of layers stacked (3), the number of hidden vector sizes (250), drop out keep prob (0.5), ... The hardware is also the same.
My question is: what did I do wrong here?
My original model (using LSTMCell):
tf_b_VCCs_AMs_BN1 = tf.layers.batch_normalization(
tf_b_VCCs_AMs,
axis=-1,
training=Flg_training,
trainable=True,
name="Inputs_BN"
)
dropcells = []
for iiLyr in range(3):
cell_iiLyr = tf.nn.rnn_cell.LSTMCell(num_units=250, state_is_tuple=True)
dropcells.append(tf.nn.rnn_cell.DropoutWrapper(cell=cell_iiLyr, output_keep_prob=0.5))
MultiLyr_cell = tf.nn.rnn_cell.MultiRNNCell(cells=dropcells, state_is_tuple=True)
outputs, states = tf.nn.bidirectional_dynamic_rnn(
cell_fw=MultiLyr_cell,
cell_bw=MultiLyr_cell,
dtype=tf.float32,
sequence_length=tf_b_lens,
inputs=tf_b_VCCs_AMs_BN1,
scope = "BiLSTM"
)
My new model (using LayerNormBasicLSTMCell):
...
dropcells = []
for iiLyr in range(3):
cell_iiLyr = tf.contrib.rnn.LayerNormBasicLSTMCell(
num_units=250,
forget_bias=1.0,
activation=tf.tanh,
layer_norm=True,
norm_gain=1.0,
norm_shift=0.0,
dropout_keep_prob=0.5
)
dropcells.append(cell_iiLyr)
...