In general, the goal of CVs is NOT to do hyperparameter optimization. The goal is to evaluate the effectiveness of the model building procedure .
The main train / test separation is conceptually identical to a 1-fold CV (with a non-standard split size, as opposed to a 1 / K train size in k-fold CV). The advantage of performing more splits (i.e., k> 1 CV) is to obtain additional information about estimating the generalization error. There is more information in the sense of error + static uncertainty. There is an excellent discussion about CrossValidated (start with the links added to the question, which cover the same question, but are worded differently). It covers nested cross validation and is not at all simple. But if you wrap your concept as a whole, it will help you in various non-trivial situations. The idea you should take away is this: The goal of the CV is to evaluate the effectiveness of the model building procedure .
Keeping this idea in mind, how can we approximate the assessment of the hyperparameter as a whole (not only in LightGBM)?
- You want to prepare a model with a set of parameters for some data and evaluate each variation of the model on an independent (validation) set. Then you intend to choose the best options, choosing the option that gives the best rating of your choice.
- This can be done with a simple train / test . But the estimated performance and, therefore, the selection of the optimal model parameters may simply be fluctuations for a particular separation.
- Thus, you can evaluate each of these models with a more statistically reliable estimate of averaging over several train / test divisors, i.e., a multiple of CV .
Then you can take another step and say that you have an extra hold set that was separated before the hyperparameter optimization was launched. Thus, you can evaluate the selected best model on this set to measure the final generalization error. However, you can take it one step further, and instead of having one test sample, you can have an external CV loop that leads us to nested cross-validation.
Technically, lightbgm.cv() allows you to evaluate performance only when splitting k-fold with fixed model parameters. To configure the hyperparameter, you need to run it in a loop, providing various parameters and average transcoding performance to select the best set of parameters. after the completion of the cycle. This interface is different from sklearn , which provides you with full functionality for optimizing hyperparameters in a CV loop. Personally, I would recommend using the sklearn-API lightgbm . It's just a wrapper around the built-in lightgbm.train() function, so it doesn't slow down. But it allows you to use the full sklearn tools, Thich makes your life a lot easier.
source share