TimeDistributed (Dense) vs Dense in Keras - Same number of parameters

I am creating a model that converts a string to another string using repeating layers (GRU). I tried both Dense and TimeDistributed (Dense) as the last but one layer, but I don’t understand the difference between them when using return_sequences = True, especially since they seem to have the same number of parameters.

My simplified model is as follows:

InputSize = 15
MaxLen = 64
HiddenSize = 16

inputs = keras.layers.Input(shape=(MaxLen, InputSize))
x = keras.layers.recurrent.GRU(HiddenSize, return_sequences=True)(inputs)
x = keras.layers.TimeDistributed(keras.layers.Dense(InputSize))(x)
predictions = keras.layers.Activation('softmax')(x)

Network Summary:

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
input_1 (InputLayer)         (None, 64, 15)            0         
_________________________________________________________________
gru_1 (GRU)                  (None, 64, 16)            1536      
_________________________________________________________________
time_distributed_1 (TimeDist (None, 64, 15)            255       
_________________________________________________________________
activation_1 (Activation)    (None, 64, 15)            0         
=================================================================

This makes sense to me, since my understanding of TimeDistributed is that it applies the same level at all time points, so the Dense layer has 16 * 15 + 15 = 255 parameters (weight + offset).

However, if I switch to a simple Dense layer:

inputs = keras.layers.Input(shape=(MaxLen, InputSize))
x = keras.layers.recurrent.GRU(HiddenSize, return_sequences=True)(inputs)
x = keras.layers.Dense(InputSize)(x)
predictions = keras.layers.Activation('softmax')(x)

I have only 255 parameters:

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
input_1 (InputLayer)         (None, 64, 15)            0         
_________________________________________________________________
gru_1 (GRU)                  (None, 64, 16)            1536      
_________________________________________________________________
dense_1 (Dense)              (None, 64, 15)            255       
_________________________________________________________________
activation_1 (Activation)    (None, 64, 15)            0         
=================================================================

, , Dense() . , Dense TimeDistributed (Dense).

. https://github.com/fchollet/keras/blob/master/keras/layers/core.py, , Dense :

def build(self, input_shape):
    assert len(input_shape) >= 2
    input_dim = input_shape[-1]

    self.kernel = self.add_weight(shape=(input_dim, self.units),

keras.dot :

def call(self, inputs):
    output = K.dot(inputs, self.kernel)

keras.dot , n- . , , Dense() . , , TimeDistributed() .

+4
1

TimeDistributedDense GRU/LSTM. , . ( , ).

, return_sequences = False, Dense . , RNN . return_sequences = True, Dense- , TimeDistributedDense.

, , "return_sequences = False", Dense . , , Y [Batch_size, InputSize], , .

from keras.models import Sequential
from keras.layers import Dense, Activation, TimeDistributed
from keras.layers.recurrent import GRU
import numpy as np

InputSize = 15
MaxLen = 64
HiddenSize = 16

OutputSize = 8
n_samples = 1000

model1 = Sequential()
model1.add(GRU(HiddenSize, return_sequences=True, input_shape=(MaxLen, InputSize)))
model1.add(TimeDistributed(Dense(OutputSize)))
model1.add(Activation('softmax'))
model1.compile(loss='categorical_crossentropy', optimizer='rmsprop')


model2 = Sequential()
model2.add(GRU(HiddenSize, return_sequences=True, input_shape=(MaxLen, InputSize)))
model2.add(Dense(OutputSize))
model2.add(Activation('softmax'))
model2.compile(loss='categorical_crossentropy', optimizer='rmsprop')

model3 = Sequential()
model3.add(GRU(HiddenSize, return_sequences=False, input_shape=(MaxLen, InputSize)))
model3.add(Dense(OutputSize))
model3.add(Activation('softmax'))
model3.compile(loss='categorical_crossentropy', optimizer='rmsprop')

X = np.random.random([n_samples,MaxLen,InputSize])
Y1 = np.random.random([n_samples,MaxLen,OutputSize])
Y2 = np.random.random([n_samples, OutputSize])

model1.fit(X, Y1, batch_size=128, nb_epoch=1)
model2.fit(X, Y1, batch_size=128, nb_epoch=1)
model3.fit(X, Y2, batch_size=128, nb_epoch=1)

print(model1.summary())
print(model2.summary())
print(model3.summary())

model1 model2 ( ), model3 - .

+3

Source: https://habr.com/ru/post/1679537/


All Articles