Can I safely assign `coef_` and other evaluation parameters in scikit-learn?

scikit-learn suggests using a brine to save the model. However, they note the limitations of the brine when it comes to different versions of scikit-learn or python. (See also this question in stackoverflow )

In many computer learning approaches, only a few parameters are extracted from large data sets. These estimates are stored in attributes with a final underscore , for example.coef_

Now my question is: Is it possible to maintain the stability of the model by preserving the evaluation attributes and assigning them later? Is this approach safe for all scikit-learn grades, or are there potential side effects (e.g. private variables that need to be set) for some grades?

It seems to work for logistic regression, as shown in the following example:

from sklearn import datasets
from sklearn.linear_model import LogisticRegression
try:
    from sklearn.model_selection import train_test_split
except ImportError:
    from sklearn.cross_validation import train_test_split
iris = datasets.load_iris()
tt_split = train_test_split(iris.data, iris.target, test_size=0.4)
X_train, X_test, y_train, y_test = tt_split

# Here we train the logistic regression
lr = LogisticRegression(class_weight='balanced')
lr.fit(X_train, y_train)
print(lr.score(X_test, y_test))     # prints 0.95

# Persisting
params = lr.get_params()
coef = lr.coef_
intercept = lr.intercept_
# classes_ is not documented as public member, 
# but not explicitely private (not starting with underscore)
classes = lr.classes_ 
lr.n_iter_ #This is meta-data. No need to persist


# Now we try to load the Classifier 
lr2 = LogisticRegression()
lr2.set_params(**params)
lr2.coef_ = coef
lr2.intercept_ = intercept
lr2.classes_ = classes
print(lr2.score(X_test, y_test)) #Prints the same: 0.95
+4
source share
1 answer

Setting valuation attributes alone is not enough - at least in the general case for all valuations.

I know at least one example where this can fail.  LinearDiscriminantAnalysis.transform()uses a private attribute _max_components:

def transform(self, X):
        # ... code omitted
        return X_new[:, :self._max_components]

. , , __init__() .fit().

, .__dict__. :.

from sklearn.discriminant_analysis import LinearDiscriminantAnalysis as LDA
lda = LDA().fit([[1, 2, 3], [1, 2, 1], [4, 5, 6], [9, 9, 9]], [1, 2, 1, 2])
lda.__dict__
# {'_max_components': 1,
#  'classes_': array([1, 2]),
#  'coef_': array([[ -9.55555556,  21.55555556,  -9.55555556]]),
#  'explained_variance_ratio_': array([ 1.]),
#  'intercept_': array([-15.77777778]),
#  'means_': array([[ 2.5,  3.5,  4.5],
#         [ 5. ,  5.5,  5. ]]),
#  'n_components': None,
#  'priors': None,
#  'priors_': array([ 0.5,  0.5]),
#  'scalings_': array([[-2.51423299],
#         [ 5.67164186],
#         [-2.51423299]]),
#  'shrinkage': None,
#  'solver': 'svd',
#  'store_covariance': False,
#  'tol': 0.0001,
#  'xbar_': array([ 3.75,  4.5 ,  4.75])}

, , , . . Scikit-learn Persistence JSON Serialization.

, scikit-learn. - , , , .

+4
source

Source: https://habr.com/ru/post/1686017/


All Articles