Modification of SMOTE parameters in the CARET k-fold cross-validation classification

I have a classification problem with a very distorted class for forecasting (e.g. to predict an asymmetric binary variable of 90% / 10%).

To deal with this problem, I want to use the SMOTE method to override this class variable. However, as I read here ( http://www.marcoaltini.com/blog/dealing-with-imbalanced-data-undersampling-oversampling-and-proper-cross-validation ), it is best to use SMOTE inside the k-fold loop, to avoid retraining.

Since I use the caret package to perform my analysis, I am referring to this link ( http://topepo.imtqy.com/caret/sampling.html ). I am well versed, but in the last part, which explains how to change SMOTE parameters:

smotest <- list(name = "SMOTE with more neighbors!",
            func = function (x, y) {
              library(DMwR)
              dat <- if (is.data.frame(x)) x else as.data.frame(x)
              dat$.y <- y
              dat <- SMOTE(.y ~ ., data = dat, k = 10)
              list(x = dat[, !grepl(".y", colnames(dat), fixed = TRUE)],
                   y = dat$.y)
              },
            first = TRUE)

I just don't get it. Does anyone want to explain? Say I want to enable the SMOTE parameters perc.over, k and perc.under, how would I do this?

Many thanks.

EDIT:

Actually, I realized that maybe I’ll just add these parameters inside the expression "SMOTE" in the above function, for example, it will give something like:

smotest <- list(name = "SMOTE with more neighbors!",
            func = function (x, y) {
              library(DMwR)
              dat <- if (is.data.frame(x)) x else as.data.frame(x)
              dat$.y <- y
              dat <- SMOTE(.y ~ ., data = dat, k = 10, perc.over = 1200, perc.under = 100)
              list(x = dat[, !grepl(".y", colnames(dat), fixed = TRUE)],
                   y = dat$.y)
              },
            first = TRUE)
+4
source share
1 answer

I'm not sure I understood that you do not understand, but here is an attempt to clarify what is being done in this piece of code.

smotest , sampling trainControl. name, . , func, . , first, , , samplin .

func SMOTE. 3 , data.frame SMOTE. 4 , SMOTE formula, data.frame, x y. 6 , , trainControl.

, : , , SMOTE.

+1

Source: https://habr.com/ru/post/1617107/


All Articles