What is the correct way to measure if the machine learning model has an add-on?

I understand the intuitive meaning of retraining and reinforcement. Now, given the specific machine learning model that learns from training data, how can you determine if training has been retrained or faked data? Is there a quantitative way to measure these factors?

Can we look at the error and say whether it has an add-on or an underestimation?

+4
source share
4 answers

You do not look at the error of the training data, but only at the verification data.

The usual way to test is to try out the various complexities of the model and see how the error varies with the complexity of the model. Usually they have a typical curve. In the beginning, errors quickly improve. Then there is saturation (where is a good model), then they begin to decrease again, but not because of a better model, but because of retraining. You want to be on the low level of complexity of the plateau, the simplest model that provides a reasonable generalization.

+1
source

I believe the easiest way is to have two datasets. Training data and validation data. You train the model according to the training data if the correspondence of the model according to the training data is close to the model's suitability to the validation data. When the model's usability factor increases on training data, but not on validation data, then you recycle.

+8
source

The usual way, I think, is known as cross validation. The idea is to break the training set into several parts, known as folds, then select one at a time for assessment and training on the remaining ones.

This, of course, does not measure actual re-equipment or reinforcement, but if you can change the complexity of the model, for example. by changing the regularization term, you can find the optimal point. I think this applies only to training and testing.

+4
source

Existing answers are strictly not true, but they are incomplete. Yes, you need a validation suite, but the important problem is that you are not just looking at the model error for the validation suite and trying to minimize it. All this will lead to excessive retraining, because you will effectively approach the verification established in this way. The right approach does not minimize the error on your sets, but makes the error independent of which training and validation settings you use. If the error in the set of validation is significantly different (it does not matter if it is worse or better), then the model will be superimposed. Also, of course, this should be done with cross-validation when you train on some random set, and then check on another random set.

0
source

Source: https://habr.com/ru/post/1434285/


All Articles