Why does GridSearchCV spend more than 50% of its time on the {method of "acquiring" thread.lock objects}?

I recently set up some of my machine learning pipelines. I decided to use my multi-core processor. And I checked cross validation with n_jobs=-1 parameter. I also profiled it and that was a surprise to me: the top function was:

 {method 'acquire' of 'thread.lock' objects} 

I was not sure if this was my mistake due to the operations that I performed in Pipeline . So I decided to do a little experiment:

 pp = Pipeline([('svc', SVC())]) cv = GridSearchCV(pp, {'svc__C' : [1, 100, 200]}, jobs=-1, cv=2, refit=True) %prun cv.fit(np.random.rand(1e4, 100), np.random.randint(0, 5, 1e4)) 

Output:

 2691 function calls (2655 primitive calls) in 74.005 seconds Ordered by: internal time ncalls tottime percall cumtime percall filename:lineno(function) 83 43.819 0.528 43.819 0.528 {method 'acquire' of 'thread.lock' objects} 1 30.112 30.112 30.112 30.112 {sklearn.svm.libsvm.fit} 

I wonder what is the reason for this behavior. And if you can speed it up a bit.

+6
source share
1 answer

The profiler tells you what the main process is doing, and its child processes do all the work. Setting verbose=2 to GridSearchCV may give a better result than %prun in this case.

+5
source

Source: https://habr.com/ru/post/958394/


All Articles