Use hidden states instead of exits in LSTM keras

I want to use the implementation of the attention mechanism Yang et al. . I found a working implementation of a custom layer that uses this attention mechanism here . Instead of using the output values โ€‹โ€‹of my LSTM:

my_lstm = LSTM(128, input_shape=(a, b), return_sequences=True) my_lstm = AttentionWithContext()(my_lstm) out = Dense(2, activation='softmax')(my_lstm) 

I would like to use the hidden states of LSTM:

 my_lstm = LSTM(128, input_shape=(a, b), return_state=True) my_lstm = AttentionWithContext()(my_lstm) out = Dense(2, activation='softmax')(my_lstm) 

But I get the error:

TypeError: can only concatenate the tuple (not "int") for the tuple

I tried this in conjunction with return_sequences, but everything I tried so far has failed. How can I modify return tensors to use it as return output sequences?

Thanks!

+5
source share
2 answers

I think your confusion may be related to the Keras documentation, which is a bit unclear.

 return_sequences: Boolean. Whether to return the last output in the output sequence, or the full sequence. return_state: Boolean. Whether to return the last state in addition to the output. 

The documents on return_state particularly confusing because they imply that hidden states are different from exits, but they are one in one. For LSTM, this becomes a little muddy, because in addition to the hidden (output) states, there is a state of the cell. We can confirm this by looking at the LSTM step function in the Keras code:

 class LSTM(Recurrent): def step(...): ... return h, [h, c] 

The return type of this function is step output, states . Thus, we can see that the hidden state h is actually an output, and for the states we get both the hidden state h and the state of cell c . That's why you see a Wiki article that you link to using the terms โ€œhiddenโ€ and โ€œoutputโ€ interchangeably.

Looking at the paper that you tied a little closer, it seems to me that your initial implementation is what you want.

 my_lstm = LSTM(128, input_shape=(a, b), return_sequences=True) my_lstm = AttentionWithContext()(my_lstm) out = Dense(2, activation='softmax')(my_lstm) 

This will convey a latent state to every time level of your attention. The only scenario you are out of luck with is the one where you really want to pass the state of the cell from each point in time to your level of attention (this is what I thought initially), but I don't think that this is what Do you want to. The document you linked actually uses the GRU level, which has no idea about the state of the cell, and the step function also returns a hidden state as output.

 class GRU(Recurrent): def step(...): ... return h, [h] 

Thus, the article almost certainly refers to hidden states (aka exits), and not to cell states.

+5
source

Just add one point to Nicole's answer -

If we use the combination return_state = True and return_sequences = True in LSTM, then the first [h] will return the hidden status aka for each time step (vector), while the second [h] will return the hidden state at the last step (scalar).

0
source

Source: https://habr.com/ru/post/1270799/


All Articles