Classification of tensor flow with extremely unbalanced dataset

Question

Classification of tensor flow with extremely unbalanced dataset

I use TensorFlow LinearClassifier as well as DNN to classify a dataset from two classes.

However, the problem is that the data set contains 96% of the positive result and 4% of the negative result, and my program always returns the forecast as positive. Of course, in this case, I would achieve an accuracy of 96%, but that makes no sense.

What is a good way to handle this situation?

+5

machine-learning classification tensorflow

mamatv Dec 28 '15 at 21:25

source share

4 answers

kkawabat · Answer 1 · 2015-12-28T21:32:46+0000

You can try to change the cost function so that a false positive result is fined more heavily than a false one.

George Dahl · Answer 2 · 2015-12-28T22:14:38+0000

Here's what you can do with the simplest solutions:

You can create mini filters that select classes equally, and then recalibrate the model during testing.
You can recount the examples to support the negatives.
You can use loop loss instead of log loss, which may be more resistant to unbalanced data, since it will not get a gradient if the example is true outside the field.
You can study some other loss functions that asymptotically relate to different types of errors.

Amir · Answer 3 · 2015-12-28T22:37:07+0000

You can recognize the autocoder using the negative examples you have (if the number is large), and then generate the examples using an output method such as variational Bayes or Markov Monte Carlo chain . This way you can increase the number of samples for negative examples and move on to a more balanced dataset.

Riiyaz · Answer 4 · 2015-12-29T13:36:38+0000

You can check this document for different sampling methods to mitigate the imbalance class problem http://www.machinelearning.org/proceedings/icml2007/papers/62.pdf . Simple random oversampling of a minority usually works better.

Classification of tensor flow with extremely unbalanced dataset

More articles: