H2O AutoML uses H2O algos (like RF, GBM) from below, so if you cannot get good models there, you will experience the same problems using AutoML. I'm not sure what I would call this refit - especially since your models do not succeed in predicting emissions.
My recommendation is to register your response variable - this is a useful thing when you make a skewed reaction. In the future, H2O AutoML will try to automatically detect a distorted answer and take a log, but this is not a function of the current version (H2O 3.16. *).
Here is a little more detail if you are not familiar with this process. First create a new column, for example. log_response , as follows and use this as an answer when learning (in RF, GBM or AutoML):
train[,"log_response"] <- h2o.log(train[,response])
Cautions. If you have zeros in the answer, you should use h2o.log1p() . Do not include the original answer in your predictors. In your case, you do not need to change anything, because you already explicitly specify predictors using the vector predictors .
Keep in mind that when you record a response, your predictions and model metrics will appear in the log scale. Therefore, if you need to convert your forecasts back to a normal scale, for example:
model <- h2o.randomForest(x = predictors, y = "log_response", training_frame = train, valid = valid) log_pred <- h2o.predict(model, test) pred <- h2o.exp(log_pred)
This gives you predictions, but if you also want to see metrics, you will have to calculate those that use the h2o.make_metrics() function using new ancestors, rather than retrieving metrics from the model.
perf <- h2o.make_metrics(predicted = pred, actual = test[,response]) h2o.mse(perf)
You can try this using RF, as I showed above, or GBM, or with AutoML (which should provide better performance than RF or GBM alone).
Hope this helps improve the performance of your models!