SKlearn SGD Partial Fit

What am I doing wrong here? I have a large dataset that I want to perform in a partial fit using Scydit-learn SGDClassifier

I do the following

from sklearn.linear_model import SGDClassifier
import pandas as pd

chunksize = 5
clf2 = SGDClassifier(loss='log', penalty="l2")

for train_df in pd.read_csv("train.csv", chunksize=chunksize, iterator=True):
    X = train_df[features_columns]
    Y = train_df["clicked"]
    clf2.partial_fit(X, Y)

I get an error

Traceback (last last call): File /predict.py, line 48, in sys.exit (0, if main () else 1) File "/predict.py", line 44, basically Predict () File " /predict.py ", line 38, in the forecast clf2.partial_fit (X, Y) File" /Users/anaconda/lib/python3.5/site-packages/sklearn/linear_model/stochastic_gradient.py ", line 512, in partial_fit coef_init = None, intercept_init = None) File "/Users/anaconda/lib/python3.5/site-packages/sklearn/linear_model/stochastic_gradient.py", line 349, in _partial_fit _check_partial_fit_first_call (self Users / classes) anaconda / lib / python3.5 / site-packages / sklearn / utils / multiclass.py ", line 297, in _check_partial_fit_first_call raise ValueError (" classes must be passed on the first call to "ValueError:classes must be passed the first time partial_fit is called.

+4
2

, , , np.unique(target), target - . , , , ! :

for train_df in pd.read_csv("train.csv", chunksize=chunksize, iterator=True):
   X = train_df[features_columns]
   Y = train_df["clicked"]
   clf2.partial_fit(X, Y, classes=np.unique(Y))
+3

Source: https://habr.com/ru/post/1669313/


All Articles