How to save scaling options for later use

Question

How to save scaling options for later use

I want to use the sklearn.preprocessing.scale scaling sklearn.preprocessing.scale , which scikit-learn offers to center the dataset that I will use to train the svm classifier.

How can I save the standardization parameters so that I can also apply them to the data that I want to classify?

I know that I can use standarScaler , but can I somehow serialize it to a file so that I do not have to fit it to my data every time I want to run the classifier?

+10

python scikit-learn normalization standardized

LetsPlayYahtzee Mar 11 '16 at 16:04

source share

3 answers

Pickling is usually a bad idea, at least in production, so I use a different approach:

 # scaler is fitted instance of MinMaxScaler scaler_data_ = np.array([scaler.data_min_, scaler.data_max_]) np.save("my_scaler.npy", scaler_data_) #some not scaled X Xreal = np.array([1.9261148646249848, 0.7327923702472628, 118, 1083]) scaler_data_ = np.load("my_scaler.npy") Xmin, Xmax = scaler_data_[0], scaler_data_[1] Xscaled = (Xreal - Xmin) / (Xmax-Xmin) Xscaled # -> array([0.63062502, 0.35320565, 0.15144766, 0.69116555])

0

Artem Zhukov Aug 14 '18 at 16:20

source share

Scale with standard scaler

 from sklearn.preprocessing import StandardScaler scaler = StandardScaler() scaler.fit(data) scaled_data = scaler.transform(data)

save mean_ and var_ for later use

 means = scaler.mean_ vars = scaler.var_

(you can print and copy paste and change tools or save to disk using np.save ....)

Later using saved options

 def scale_data(array,means=means,stds=vars **0.5): return (array-means)/stds scale_new_data = scale_data(new_data)

0

Ioannis nasios Aug 19 '19 at 10:36

source share

Ami tavory · Accepted Answer · 2016-03-11T16:22:57+0000

I think the best way is to post a fit post as this is the most common option. Perhaps later you will create a pipeline consisting of both a function extractor and a scaler. By choosing a (possibly difficult) stage, you make things more general. The sklearn documentation on model conservation discusses how to do this.

Having said that, you can ask sklearn.preprocessing.StandardScaler for fitting parameters:

scale_ : ndarray, shape (n_features,) Based on the characteristics of the relative scaling of the data. New in version 0.17: recommended scale_ instead of the obsolete std_. mean_ : array of floating point numbers [n_features] The average value for each function in the training set.

The following short excerpt illustrates this:

 from sklearn import preprocessing import numpy as np s = preprocessing.StandardScaler() s.fit(np.array([[1., 2, 3, 4]]).T) >>> s.mean_, s.scale_ (array([ 2.5]), array([ 1.11803399]))

How to save scaling options for later use

More articles: