Python - LightGBM with GridSearchCV, works forever

Question

Python - LightGBM with GridSearchCV, works forever

Recently, I have been doing some experiments to compare Python XgBoost and LightGBM. This LightGBM seems to be a new algorithm, according to which people say that it works better than XGBoost in both speed and accuracy.

This is a LightGBM GitHub . These are Python LightGBM Python API documents , here you will find python functions that you can call. It can be called directly from the LightGBM model, and it can also be called LightGBM scikit-learn.

This is the XGBoost Python API . I use. As you can see, it has a very similar data structure, such as the Python LightGBM API above.

Here is what I tried:

If you use the train() method in both XGBoost and LightGBM, yes, lightGBM is faster and more accurate. But this method has no cross validation.
If you try to use the cv() method in both algorithms, then for cross-validation. However, I did not find a way to use it, returning a set of optimal parameters.
if you try scikit-learn GridSearchCV() with LGBMClassifier and XGBClassifer. It works for XGBClassifer, but for LGBClassifier it works forever.

Here are my code examples when using GridSearchCV() with both classifiers:

XGBClassifier with GridSearchCV

 param_set = { 'n_estimators':[50, 100, 500, 1000] } gsearch = GridSearchCV(estimator = XGBClassifier( learning_rate =0.1, n_estimators=100, max_depth=5, min_child_weight=1, gamma=0, subsample=0.8, colsample_bytree=0.8, nthread=7, objective= 'binary:logistic', scale_pos_weight=1, seed=410), param_grid = param_set, scoring='roc_auc',n_jobs=7,iid=False, cv=10) xgb_model2 = gsearch.fit(features_train, label_train) xgb_model2.grid_scores_, xgb_model2.best_params_, xgb_model2.best_score_

This works very well for XGBoost and only a few seconds.

LightGBM with GridSearchCV

 param_set = { 'n_estimators':[20, 50] } gsearch = GridSearchCV(estimator = LGBMClassifier( boosting_type='gbdt', num_leaves=30, max_depth=5, learning_rate=0.1, n_estimators=50, max_bin=225, subsample_for_bin=0.8, objective=None, min_split_gain=0, min_child_weight=5, min_child_samples=10, subsample=1, subsample_freq=1, colsample_bytree=1, reg_alpha=1, reg_lambda=0, seed=410, nthread=7, silent=True), param_grid = param_set, scoring='roc_auc',n_jobs=7,iid=False, cv=10) lgb_model2 = gsearch.fit(features_train, label_train) lgb_model2.grid_scores_, lgb_model2.best_params_, lgb_model2.best_score_

However, using this method for LightGBM, it works all morning today, until nothing is generated.

I use the same dataset, the dataset contains 30,000 records.

I have 2 questions:

If we just use the cv() method, do we still need to configure the optimal set of parameters?
Do you know why GridSearchCV() does not work with LightGBM? I wonder if this only happens to me, what happened to others?

+5

python xgboost cross-validation grid-search lightgbm

Cherry wu Jul 11 '17 at 23:20

source share

1 answer

sera · Accepted Answer · 2017-07-12T07:49:09+0000

Try using n_jobs = 1 and see if it works.

In general, if you use n_jobs = -1 or n_jobs > 1 , then you should protect your script with if __name__=='__main__': ::

A simple example:

 import ... if __name__=='__main__': data= pd.read_csv('Prior Decompo2.csv', header=None) X, y = data.iloc[0:, 0:26].values, data.iloc[0:,26].values param_grid = {'C' : [0.01, 0.1, 1, 10], 'kernel': ('rbf', 'linear')} classifier = SVC() grid_search = GridSearchCV(estimator=classifier, param_grid=param_grid, scoring='accuracy', n_jobs=-1, verbose=42) grid_search.fit(X,y)

Finally, can you try to run your code with n_jobs = -1 and enable if __name__=='__main__': as I explained, and see if it works?

Python - LightGBM with GridSearchCV, works forever

More articles: