Tensor training: why two consecutive training is better than one training

Question

Tensor training: why two consecutive training is better than one training

I am working on a regression problem using Keras + Tensorflow. And I found something interesting. 1) here are two models that are actually the same, except that the first model uses a globally defined “optimizer”.

optimizer = Adam() #as a global variable
def OneHiddenLayer_Model():
    model = Sequential()
    model.add(Dense(300 * inputDim, input_dim=inputDim, kernel_initializer='normal', activation=activationFunc)) 
    model.add(Dense(1, kernel_initializer='normal'))
    model.compile(loss='mean_squared_error', optimizer=optimizer)
    return model

def OneHiddenLayer_Model2():
    model = Sequential()
    model.add(Dense(300 * inputDim, input_dim=inputDim, kernel_initializer='normal', activation=activationFunc)) 
    model.add(Dense(1, kernel_initializer='normal'))
    model.compile(loss='mean_squared_error', optimizer=Adam())
    return model

2) Then I use two schemes for training data sets (training set (scaleX, Y), test set (scaleTestX, testY)).

2.1) Scheme 1. two consecutive fittings with the first model

numpy.random.seed(seed)
model = OneHiddenLayer_Model()
model.fit(scaleX, Y, validation_data=(scaleTestX, testY), epochs=250, batch_size=numBatch, verbose=0)

numpy.random.seed(seed)
model = OneHiddenLayer_Model()
history = model.fit(scaleX, Y, validation_data=(scaleTestX, testY), epochs=500, batch_size=numBatch, verbose=0)

predictY = model.predict(scaleX)
predictTestY = model.predict(scaleTestX)

2.2) Scheme 2. one fitting with a second model

numpy.random.seed(seed)
model = OneHiddenLayer_Model2()
history = model.fit(scaleX, Y, validation_data=(scaleTestX, testY), epochs=500, batch_size=numBatch, verbose=0)

predictY = model.predict(scaleX)
predictTestY = model.predict(scaleTestX)

3). Finally, the results are displayed for each scheme, as shown below (model loss history → forecast on scaleX → forecast on scaleTestX),

3.1) Scheme 1