Scikit recognizes RandomForest memory error

I am trying to run scikit to learn a random forest algorithm in a mnist handwritten digit dataset. While learning the algorithm, the system goes into a memory error. Please tell me what to do to fix this problem.

Processor Statistics: Intel Core 2 Duo with 4 GB RAM

Dataset form 60,000, 784 . The complete error, as on the linux terminal, is as follows:

> File "./reducer.py", line 53, in <module>
>     main()   File "./reducer.py", line 38, in main
>     clf = clf.fit(data,labels) #training the algorithm   File "/usr/lib/pymodules/python2.7/sklearn/ensemble/forest.py", line 202,
> in fit
>     for i in xrange(n_jobs))   File "/usr/lib/pymodules/python2.7/joblib/parallel.py", line 409, in
> __call__
>     self.dispatch(function, args, kwargs)   File "/usr/lib/pymodules/python2.7/joblib/parallel.py", line 295, in
> dispatch
>     job = ImmediateApply(func, args, kwargs)   File "/usr/lib/pymodules/python2.7/joblib/parallel.py", line 101, in
> __init__
>     self.results = func(*args, **kwargs)   File "/usr/lib/pymodules/python2.7/sklearn/ensemble/forest.py", line 73, in
> _parallel_build_trees
>     sample_mask=sample_mask, X_argsorted=X_argsorted)   File "/usr/lib/pymodules/python2.7/sklearn/tree/tree.py", line 476, in fit
>     X_argsorted=X_argsorted)   File "/usr/lib/pymodules/python2.7/sklearn/tree/tree.py", line 357, in
> _build_tree
>     np.argsort(X.T, axis=1).astype(np.int32).T)   File "/usr/lib/python2.7/dist-packages/numpy/core/fromnumeric.py", line
> 680, in argsort
>     return argsort(axis, kind, order) MemoryError
+4
source share
3 answers

n_jobs=1, scikit-learn. , , , (X y) . , .

+3

Scikit-learn Dev , .ensemble

scikit-learn 0.16.1 "" X, y > , .

- RandomForestRegressor(), . .

0.16.1 2% - n_jobs = 1 { 2, 3, ... }

scikit-learn, @glouppe, (2014-, . 0.15.0), . R-based RandomForest.

, 25+ , , .. np.asfortranarray(...), ( - ), , Scikit-learn, , , " ".

?

, . , FeatureSET. , -, , :

  • max_features .
  • O/S mkswap + swapon, , 1.

.

.set_params( n_jobs = -1 ).fit( X, y ) RandomForestRegressor(), , .predict( X_observed ) .

/ ( 0.17.0).

, .set_params( n_jobs = 1 ).predict( X_observed ) .predict()

+2

(0.19) scikit-learn. (, ):

 Fixed excessive memory usage in prediction for random forests estimators. #8672 by Mike Benfield.

, :

pip3 install scikit-learn==0.19.0
0
source

Source: https://habr.com/ru/post/1536851/


All Articles