I built a sequential keras model with 35,000 input samples and 20 predictors, distribution of test output classes:
- Class_0 = 5.6%
- Class_1 = 7.7%
- Class_2 = 35.6%
- Class_3 = 45.7%
- Class_4 = 5.4%
After converting the outputs to a binary class matrix using (np_utils.to_categorical) the training accuracy is about 54%, when I model the setup with test data (15000 samples), all predictions (100%) happen for the same class, which is class_3 "the highest event in the textbook, "what is the reason for this bias and not having a single prediction for other classes? how to make the model sensitive for predicting fewer classes and improve accuracy, especially if the match in the training data is small, like 1-3%.
model = Sequential()
model.add(Dense(40, input_dim=20, activation='relu'))
model.add(Dropout(0.2))
model.add(Dense(10, activation='relu'))
model.add(Dense(5, activation='softmax'))
model.compile(loss='categorical_crossentropy', optimizer='rmsprop', metrics=['accuracy'])
model.fit(X, Y, epochs=500, verbose=1)
source
share