What is the difference between bidirectional LSTM and LSTM?

Can someone explain this? I know that bidirectional LSTMs have front and anus, but what is the advantage of this in unidirectional LSTM?

Is it better for each of them?

+17
source share
4 answers

LSTM in its core stores information from inputs that have already passed through it, using a hidden state.

A unidirectional LSTM only stores information from the past , because the only inputs that it saw are from the past.

Using bidirectional will trigger your inputs in two ways: from the past to the future and from the future to the past, and the other from unidirectional - this is that in LSTM, which works in the opposite direction, you save information from the future and using two hidden states, you You can save information from both past and future at any time.

A very difficult question for them, but BiLSTMs show very good results, since they better understand the context, I will try to explain with an example.

Suppose we are trying to predict the next word in a sentence, at a high level, that a unidirectional LSTM will see,

The boys went ....

And it will try to predict the next word only in this context, with bidirectional LSTM you can see the information further down the road, for example

Forward LSTM:

The boys have gone ...

Back LSTM:

... and then they left the pool

You can see that using information from the future may be easier for the network to understand what the next word is.

+29
source

Adding Bluesummer to the answer, here's how you could implement bidirectional LSTM from scratch without calling the BiLSTM module. This can better contrast the difference between unidirectional and bidirectional LSTMs. As you can see, we combine two LSTMs to create a bi-directional LSTM.

You can combine LSTM output forwards and backwards using {'sum', 'mul', 'concat', 'ave'} .

 left = Sequential() left.add(LSTM(output_dim=hidden_units, init='uniform', inner_init='uniform', forget_bias_init='one', return_sequences=True, activation='tanh', inner_activation='sigmoid', input_shape=(99, 13))) right = Sequential() right.add(LSTM(output_dim=hidden_units, init='uniform', inner_init='uniform', forget_bias_init='one', return_sequences=True, activation='tanh', inner_activation='sigmoid', input_shape=(99, 13), go_backwards=True)) model = Sequential() model.add(Merge([left, right], mode='sum')) model.add(TimeDistributedDense(nb_classes)) model.add(Activation('softmax')) sgd = SGD(lr=0.1, decay=1e-5, momentum=0.9, nesterov=True) model.compile(loss='categorical_crossentropy', optimizer=sgd) print("Train...") model.fit([X_train, X_train], Y_train, batch_size=1, nb_epoch=nb_epoches, validation_data=([X_test, X_test], Y_test), verbose=1, show_accuracy=True) 
+1
source

Another use case for bidirectional LSTM may be to classify words in a text. They can see the past and future context of the word and are much better suited for classifying the word.

0
source

Compared to LSTM , BLSTM or BiLSTM has two networks: one access to past information in the forward direction and the other future access in the reverse direction. wiki

The new Bidirectional class is added according to the official doc here: https://www.tensorflow.org/api_docs/python/tf/keras/layers/Bipirectional

 model = Sequential() model.add(Bidirectional(LSTM(10, return_sequences=True), input_shape=(5, 10))) 

and the activation function can be added as follows:

 model = Sequential() model.add(Bidirectional(LSTM(num_channels, implementation = 2, recurrent_activation = 'sigmoid'), input_shape=(input_length, input_dim))) 

A complete example using IMDB data would be like this. Result after 4 eras.

 Downloading data from https://s3.amazonaws.com/text-datasets/imdb.npz 17465344/17464789 [==============================] - 4s 0us/step Train... Train on 25000 samples, validate on 25000 samples Epoch 1/4 25000/25000 [==============================] - 78s 3ms/step - loss: 0.4219 - acc: 0.8033 - val_loss: 0.2992 - val_acc: 0.8732 Epoch 2/4 25000/25000 [==============================] - 82s 3ms/step - loss: 0.2315 - acc: 0.9106 - val_loss: 0.3183 - val_acc: 0.8664 Epoch 3/4 25000/25000 [==============================] - 91s 4ms/step - loss: 0.1802 - acc: 0.9338 - val_loss: 0.3645 - val_acc: 0.8568 Epoch 4/4 25000/25000 [==============================] - 92s 4ms/step - loss: 0.1398 - acc: 0.9509 - val_loss: 0.3562 - val_acc: 0.8606 

BiLSTM or BLSTM

 import numpy as np from keras.preprocessing import sequence from keras.models import Sequential from keras.layers import Dense, Dropout, Embedding, LSTM, Bidirectional from keras.datasets import imdb n_unique_words = 10000 # cut texts after this number of words maxlen = 200 batch_size = 128 (x_train, y_train), (x_test, y_test) = imdb.load_data(num_words=n_unique_words) x_train = sequence.pad_sequences(x_train, maxlen=maxlen) x_test = sequence.pad_sequences(x_test, maxlen=maxlen) y_train = np.array(y_train) y_test = np.array(y_test) model = Sequential() model.add(Embedding(n_unique_words, 128, input_length=maxlen)) model.add(Bidirectional(LSTM(64))) model.add(Dropout(0.5)) model.add(Dense(1, activation='sigmoid')) model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy']) print('Train...') model.fit(x_train, y_train, batch_size=batch_size, epochs=4, validation_data=[x_test, y_test]) 
0
source

Source: https://habr.com/ru/post/1265933/


All Articles