I'm trying to use deep learning to predict the income from 15 self-reports from a dating site.
We get rather strange results when our verification data gets better accuracy and less loss than our training data. And this corresponds to different sizes of hidden layers. This is our model:
for hl1 in [250, 200, 150, 100, 75, 50, 25, 15, 10, 7]:
def baseline_model():
model = Sequential()
model.add(Dense(hl1, input_dim=299, kernel_initializer='normal', activation='relu', kernel_regularizer=regularizers.l1_l2(0.001)))
model.add(Dropout(0.5, seed=seed))
model.add(Dense(3, kernel_initializer='normal', activation='sigmoid'))
model.compile(loss='categorical_crossentropy', optimizer='adamax', metrics=['accuracy'])
return model
history_logs = LossHistory()
model = baseline_model()
history = model.fit(X, Y, validation_split=0.3, shuffle=False, epochs=50, batch_size=10, verbose=2, callbacks=[history_logs])
And this is an example of accuracy and loss:
and
.
We tried to remove the regularization and dropouts, which, as expected, resulted in re-equipment (training: ~ 85%). We even tried to drastically reduce the learning speed, with similar results.
Has anyone seen similar results?