I recently set up some of my machine learning pipelines. I decided to use my multi-core processor. And I checked cross validation with n_jobs=-1 parameter. I also profiled it and that was a surprise to me: the top function was:
{method 'acquire' of 'thread.lock' objects}
I was not sure if this was my mistake due to the operations that I performed in Pipeline . So I decided to do a little experiment:
pp = Pipeline([('svc', SVC())]) cv = GridSearchCV(pp, {'svc__C' : [1, 100, 200]}, jobs=-1, cv=2, refit=True) %prun cv.fit(np.random.rand(1e4, 100), np.random.randint(0, 5, 1e4))
Output:
2691 function calls (2655 primitive calls) in 74.005 seconds Ordered by: internal time ncalls tottime percall cumtime percall filename:lineno(function) 83 43.819 0.528 43.819 0.528 {method 'acquire' of 'thread.lock' objects} 1 30.112 30.112 30.112 30.112 {sklearn.svm.libsvm.fit}
I wonder what is the reason for this behavior. And if you can speed it up a bit.
source share