How to limit runtime but save output in R?

I am trying to limit the analysis execution time, however I want to save what the analysis has already done. In my case, I run xgb.cv (from the xgboost R package) and I want to save all iterations until the analysis reaches 10 seconds (or "n" seconds / minutes / hours).

I tried the approach mentioned in this thread , but it stops after it reaches 10 seconds without saving the previously completed iterations.

Here is my code:

 require(xgboost) require(R.utils) data(iris) train.model <- model.matrix(Sepal.Length~., iris) dtrain <- xgb.DMatrix(data=train.model, label=iris$Sepal.Length) evalerror <- function(preds, dtrain) { labels <- getinfo(dtrain, "label") err <- sqrt(sum((log(preds) - log(labels))^2)/length(labels)) return(list(metric = "error", value = err))} xgb_grid = list(eta = 0.05, max_depth = 5, subsample = 0.7, gamma = 0.3, min_child_weight = 1) fit_boost <- tryCatch( expr = {evalWithTimeout({xgb.cv(data = dtrain, nrounds = 10000, objective = "reg:linear", eval_metric = evalerror, early_stopping_rounds = 300, print_every_n = 100, params = xgb_grid, colsample_bytree = 0.7, nfold = 5, prediction = TRUE, maximize = FALSE )}, timeout = 10) }, TimeoutException = function(ex) cat("Timeout. Skipping.\n")) 

and the way out is

 #Error in dim.xgb.DMatrix(x) : reached CPU time limit 

Thanks!

+5
source share
1 answer

Edit - a little closer to what you want:

Wrap it all up with the R capture.output() function. This will save the entire evaluation result as an object R. Again, I think you are looking for something more, but it is at least locally and malleable. Syntax:

 fit_boost <- capture.output(tryCatch(expr = {evalWithTimeout({...}) ) ) > fit_boost [1] "[1]\ttrain-error:2.033160+0.006109\ttest-error:2.034180+0.017467 " ... 

Original answer:

You can also use the receiver . Just add this line before starting cross-validation:

 sink("evaluationLog.txt") fit_boost <- tryCatch( expr = {evalWithTimeout({xgb.cv(data = dtrain, nrounds = 10000, objective = "reg:linear", eval_metric = evalerror, early_stopping_rounds = 300, print_every_n = 100, params = xgb_grid, colsample_bytree = 0.7, nfold = 5, prediction = TRUE, maximize = FALSE )}, timeout = 10) }, TimeoutException = function(ex) cat("Timeout. Skipping.\n")) sink() 

Where sink() at the end will normally return the output to the console, but in this case it will not, because an error occurs. But as soon as you run this, you can open evaluationLog.txt and viola:

 [1] train-error:2.033217+0.003705 test-error:2.032427+0.012808 Multiple eval metrics are present. Will use test_error for early stopping. Will train until test_error hasn't improved in 300 rounds. [101] train-error:0.045297+0.000396 test-error:0.060047+0.001849 [201] train-error:0.042085+0.000852 test-error:0.059798+0.002382 [301] train-error:0.041117+0.001032 test-error:0.059733+0.002701 [401] train-error:0.040340+0.001170 test-error:0.059481+0.002973 [501] train-error:0.039988+0.001145 test-error:0.059469+0.002929 [601] train-error:0.039698+0.001028 test-error:0.059416+0.003018 

This, of course, is not perfect. I assume that you want to perform some operations on them, and this is not the best format. However, this is not a high order for converting this into something more manageable. I have not yet found a way to save the actual xgb.cv$evaluation_log object before the timeout. This is a very good question.

+1
source

Source: https://habr.com/ru/post/1272135/


All Articles