I am using keras 1.0.1 I am trying to add a layer of attention on top of LSTM. This is what I have so far, but it does not work.
input_ = Input(shape=(input_length, input_dim)) lstm = GRU(self.HID_DIM, input_dim=input_dim, input_length = input_length, return_sequences=True)(input_) att = TimeDistributed(Dense(1)(lstm)) att = Reshape((-1, input_length))(att) att = Activation(activation="softmax")(att) att = RepeatVector(self.HID_DIM)(att) merge = Merge([att, lstm], "mul") hid = Merge("sum")(merge) last = Dense(self.HID_DIM, activation="relu")(hid)
The network must apply LSTM over the input sequence. Then, each latent state of LSTM must be introduced into a fully connected layer through which Softmax is applied. Softmax is replicated for each hidden dimension and is multiplied by the hidden LSTM states per element. Then the resulting vector should be averaged.
EDIT . This compiles, but I'm not sure if he is doing what I think he should do.
input_ = Input(shape=(input_length, input_dim)) lstm = GRU(self.HID_DIM, input_dim=input_dim, input_length = input_length, return_sequences=True)(input_) att = TimeDistributed(Dense(1))(lstm) att = Flatten()(att) att = Activation(activation="softmax")(att) att = RepeatVector(self.HID_DIM)(att) att = Permute((2,1))(att) mer = merge([att, lstm], "mul") hid = AveragePooling1D(pool_length=input_length)(mer) hid = Flatten()(hid)
source share