Instead of recording the accuracy and return values ββafter each fold, save the forecasts in the test samples after each fold. Then collect all test predictions (i.e., Out of the bag) and calculate accuracy and feedback.
#
With one full start of k-fold cross-validation, the predictor makes one and only one prediction for each sample. If you have n samples, you should have n test predictions.
(Note. These predictions are different from training predictions because the predictor makes a prediction for each sample without being previously seen.)
If you do not use the leave-one-out cross validation, then randomly splitting data is usually required to test the k-fold intersection. Ideally, you would do a repeated (and stratified ) cross-cross-code validation. However, the combination of precision recall curves from different rounds is not straightforward because you cannot use simple linear interpolation between repeat points, unlike ROC (see Davis and Goadrich 2006 ).
I personally calculated AUC-PR using the Davis-Goadrich method for interpolation in the PR space (followed by numerical integration) and compared classifiers using AUC-PR estimates from a repeated stratified 10-fold cross check.
For a good storyline, I showed a representative PR curve from one of the cross-validation rounds.
Of course, there are many other ways to evaluate the performance of a classifier, depending on the nature of your data set.
For example, if the proportion of (binary) labels in your dataset is not skewed (i.e., approximately 50-50), you can use a simpler ROC analysis with cross-validation:
Collect the predictions from each summary and plot the ROC curves (as before), collect all the TPR-FPR points (i.e. take the union of all the TPR-FPR tuples), and then build a combined set of points with possible smoothing. Optionally, calculate AUC-ROC using simple linear interpolation and a composite trapezoidal method for numerical integration.