Scikit-learn classifier mini-batch training, where I provide mini-packages

I have a very large dataset that cannot be loaded into memory.

I want to use this set of data as a training set of the classifier scikit-learn - for example LogisticRegression.

Is it possible to perform mini-batch training of the scikit-learn classifier, where do I provide mini-lots?

+4
source share
2 answers

I believe that some classifiers in sklearnhave a method partial_fit. This method allows you to transfer thumbnails of data to the classifier, so that for each mini-channel, the gradient descent step is performed. You just loaded the mini-disk from the disk, transfer it to partial_fit, release the mini-compartment from the memory and repeat.

If you are particularly interested in doing this for logistic regression, then you will want to use SGDClassifierwhich you can set to use logistic regression when loss = 'log'.

You simply pass the functions and shortcuts for your mini-camera to the partial_fitsame way you would use fit:

clf.partial_fit(X_minibatch, y_minibatch)

Update

dask-ml library, , dask partial_fit. - .

+7

Source: https://habr.com/ru/post/1688120/


All Articles