Ivane, to correctly apply GA to classify documents:
- You need to reduce the problem to a component system that can be developed.
- You cannot complete GA training to classify documents in one document.
So, the steps you described are on the right track, but I will give you some improvements:
- Have enough training data: you need a set of documents that are already classified and diverse enough to cover the range of documents that you are likely to encounter.
- Train your GA to correctly classify a subset of these documents, as well as a set of training data.
- On each generation, test your best sample based on the verification dataset and stop training if the accuracy of the verification begins to decline.
So what do you want to do:
prevValidationFitness = default; currentValidationFitness = default; bestGA = default; while(currentValidationFitness.IsBetterThan( prevValidationFitness ) ) { prevValidationFitness = currentValidationFitness;
Whenever you receive a new document (one that has not been previously classified), you can now classify it with your best GA:
category = bestGA.Classify(document);
So, this is not an all-all-all-all-all solution, but it should give you a decent start. Pozdravi, Cyril
Kiril source share