K-fold cross validation in dplyr?

Hadley Wickham suggested that one could download using the package dplyr, his proposal was improved and then implemented in the packagebroom . Can k-fold cross validation be implemented?

I think the first step (choosing a group of trains) is very simple:

crossvalidate <- function (df, k = 5) {
  n <- nrow(df)
  idx <- sample(rep_len(1:k, n))
  attr(df, "indices") <- lapply(1:k, function(i) which(idx != i))
  attr(df, "drop") <- TRUE
  attr(df, "group_sizes") <- nrow(df) - unclass(table(idx))
  attr(df, "biggest_group_size") <- max(attr(df, "group_sizes"))
  attr(df, "labels") <- data.frame(replicate = 1:k)
  attr(df, "vars") <- list(quote(replicate))
  class(df) <- c("grouped_df", "tbl_df", "tbl", "data.frame")
  df
}

But for some reason I can’t find the documentation attr(, "indices")to find out if I can somehow use the β€œother” indexes that were chosen to select the test group indexes. Do you have ideas?

+5
source share
2 answers

https://rpubs.com/dgrtwo/cv-modelr  - k dplyr:

library(ISLR)
library(dplyr)
library(purrr)
library(modelr)
library(broom)
library(tidyr)

set.seed(1)

models <- Smarket %>%
  select(Today, Lag1:Lag5) %>%
  crossv_kfold(k = 20) %>%
  mutate(model = map(train, ~ lm(Today ~ ., data = .)))

predictions <- models %>%
  unnest(map2(model, test, ~ augment(.x, newdata = .y)))

predictions %>%
  summarize(MSE = mean((Today - .fitted) ^ 2),
            MSEIntercept = mean((Today - mean(Today))^2))
+2

5- CV dplyr:

df_fold = df %>%
  group_by(group_var) %>%
  sample_frac(1) %>%
  mutate(fold=rep(1:5, length.out=n())) %>%
  ungroup

for(i in 1:5){
  val = df_fold %>% filter(fold==i)
  tr = df_fold %>% anti_join(val, by=ID_var)
}
0

Source: https://habr.com/ru/post/1661713/


All Articles