I am creating a model that converts a string to another string using repeating layers (GRU). I tried both Dense and TimeDistributed (Dense) as the last but one layer, but I donβt understand the difference between them when using return_sequences = True, especially since they seem to have the same number of parameters.
My simplified model is as follows:
InputSize = 15
MaxLen = 64
HiddenSize = 16
inputs = keras.layers.Input(shape=(MaxLen, InputSize))
x = keras.layers.recurrent.GRU(HiddenSize, return_sequences=True)(inputs)
x = keras.layers.TimeDistributed(keras.layers.Dense(InputSize))(x)
predictions = keras.layers.Activation('softmax')(x)
Network Summary:
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
input_1 (InputLayer) (None, 64, 15) 0
_________________________________________________________________
gru_1 (GRU) (None, 64, 16) 1536
_________________________________________________________________
time_distributed_1 (TimeDist (None, 64, 15) 255
_________________________________________________________________
activation_1 (Activation) (None, 64, 15) 0
=================================================================
This makes sense to me, since my understanding of TimeDistributed is that it applies the same level at all time points, so the Dense layer has 16 * 15 + 15 = 255 parameters (weight + offset).
However, if I switch to a simple Dense layer:
inputs = keras.layers.Input(shape=(MaxLen, InputSize))
x = keras.layers.recurrent.GRU(HiddenSize, return_sequences=True)(inputs)
x = keras.layers.Dense(InputSize)(x)
predictions = keras.layers.Activation('softmax')(x)
I have only 255 parameters:
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
input_1 (InputLayer) (None, 64, 15) 0
_________________________________________________________________
gru_1 (GRU) (None, 64, 16) 1536
_________________________________________________________________
dense_1 (Dense) (None, 64, 15) 255
_________________________________________________________________
activation_1 (Activation) (None, 64, 15) 0
=================================================================
, , Dense() . , Dense TimeDistributed (Dense).
. https://github.com/fchollet/keras/blob/master/keras/layers/core.py, , Dense :
def build(self, input_shape):
assert len(input_shape) >= 2
input_dim = input_shape[-1]
self.kernel = self.add_weight(shape=(input_dim, self.units),
keras.dot :
def call(self, inputs):
output = K.dot(inputs, self.kernel)
keras.dot , n- . , , Dense() . , , TimeDistributed() .