How to interpret weights in the LSTM layer in Keras

I am currently training a recurrent neural network for weather forecasting using the LSTM layer. The network itself is quite simple and looks something like this:

model = Sequential()  
model.add(LSTM(hidden_neurons, input_shape=(time_steps, feature_count), return_sequences=False))  
model.add(Dense(feature_count))  
model.add(Activation("linear"))  

The weights of the LSTM layer have the following shapes:

for weight in model.get_weights(): # weights from Dense layer omitted
    print(weight.shape)

> (feature_count, hidden_neurons)
> (hidden_neurons, hidden_neurons)
> (hidden_neurons,)
> (feature_count, hidden_neurons)
> (hidden_neurons, hidden_neurons)
> (hidden_neurons,)
> (feature_count, hidden_neurons)
> (hidden_neurons, hidden_neurons)
> (hidden_neurons,)
> (feature_count, hidden_neurons)
> (hidden_neurons, hidden_neurons)
> (hidden_neurons,)

In short, it looks like there are four “elements” in this LSTM layer. I'm wondering how to interpret them:

  • Where is the parameter time_stepsin this view? How does this affect weight?

  • I read that LSTM consists of several blocks, such as input and shutter. If they are presented in these weight matrices, which matrix belongs to which gate?

  • , ? , (t-1, t) t-2 ..? , , t-5 , .

.

+10
2

Keras 2.2.0

print(model.layers[0].trainable_weights)

: lstm_1/kernel, lstm_1/recurrent_kernel, lstm_1/bias:0

4 * _

number_of_units - . :

units = int(int(model.layers[0].trainable_weights[0].shape[1])/4)
print("No units: ", units)

, LSTM ( ):

(), f (), c ( ) o ()

:

W = model.layers[0].get_weights()[0]
U = model.layers[0].get_weights()[1]
b = model.layers[0].get_weights()[2]

W_i = W[:, :units]
W_f = W[:, units: units * 2]
W_c = W[:, units * 2: units * 3]
W_o = W[:, units * 3:]

U_i = U[:, :units]
U_f = U[:, units: units * 2]
U_c = U[:, units * 2: units * 3]
U_o = U[:, units * 3:]

b_i = b[:units]
b_f = b[units: units * 2]
b_c = b[units * 2: units * 3]
b_o = b[units * 3:]

: keras

+11

, , , LSTM , .

github :

model = Sequential()
model.add(LSTM(4,input_dim=5,input_length=N,return_sequences=True))
for e in zip(model.layers[0].trainable_weights, model.layers[0].get_weights()):
    print('Param %s:\n%s' % (e[0],e[1]))

:

Param lstm_3_W_i:
[[ 0.00069305, ...]]
Param lstm_3_U_i:
[[ 1.10000002, ...]]
Param lstm_3_b_i:
[ 0., ...]
Param lstm_3_W_c:
[[-1.38370085, ...]]
...

. W, U, V b .

  • W - , . [input_dim, output_dim].
  • U - , . [output_dim, output_dim].
  • b - . [output_dim]
  • V , , . [output_dim, output_dim]

, 4 "" ( ).

  • gate gate: , (h_ {t-1}) (x), (C_ {t-1}):

    f_t = (W_f * x + U_f * h_ {t-1} + b_f)

    f_t - 0 1, , (= 1) (= 0) .

  • : , (h_ {t-1}) (x), (x):

    i_t = (W_i * x + U_i * h_ {t-1} + b_i)

    i_t - 0 1, , .

  • : , , input (x) (h_ {t-1}):

    Ct_t = tanh (W_c * x + U_c * h_ {t-1} + b_c)

    Ct_t - , (C_ {t-1}).

(C_t):

C_t = f_t * C_ {t-1} + i_t * Ct_t

, : , , , .

  • : , , (h_t). , h_t, :

    h_t = W_o * x + U_o * h_ {t-1} + V_o * C_t + b_o

, , LSTM. LSTM, , . .

, , , . W-, . W_c , . W_o , ... , .

, W_c, , (i_t) ... , , , , .

- . , , , .

, :-)

+6

Source: https://habr.com/ru/post/1015835/


All Articles