I have a classification problem with a very distorted class for forecasting (e.g. to predict an asymmetric binary variable of 90% / 10%).
To deal with this problem, I want to use the SMOTE method to override this class variable. However, as I read here ( http://www.marcoaltini.com/blog/dealing-with-imbalanced-data-undersampling-oversampling-and-proper-cross-validation ), it is best to use SMOTE inside the k-fold loop, to avoid retraining.
Since I use the caret package to perform my analysis, I am referring to this link ( http://topepo.imtqy.com/caret/sampling.html ). I am well versed, but in the last part, which explains how to change SMOTE parameters:
smotest <- list(name = "SMOTE with more neighbors!",
func = function (x, y) {
library(DMwR)
dat <- if (is.data.frame(x)) x else as.data.frame(x)
dat$.y <- y
dat <- SMOTE(.y ~ ., data = dat, k = 10)
list(x = dat[, !grepl(".y", colnames(dat), fixed = TRUE)],
y = dat$.y)
},
first = TRUE)
I just don't get it. Does anyone want to explain? Say I want to enable the SMOTE parameters perc.over, k and perc.under, how would I do this?
Many thanks.
EDIT:
Actually, I realized that maybe I’ll just add these parameters inside the expression "SMOTE" in the above function, for example, it will give something like:
smotest <- list(name = "SMOTE with more neighbors!",
func = function (x, y) {
library(DMwR)
dat <- if (is.data.frame(x)) x else as.data.frame(x)
dat$.y <- y
dat <- SMOTE(.y ~ ., data = dat, k = 10, perc.over = 1200, perc.under = 100)
list(x = dat[, !grepl(".y", colnames(dat), fixed = TRUE)],
y = dat$.y)
},
first = TRUE)