Save MinMaxScaler model to sklearn

I use the MinMaxScaler model in sklearn to normalize the model.

 training_set = np.random.rand(4,4)*10 training_set [[ 6.01144787, 0.59753007, 2.0014852 , 3.45433657], [ 6.03041646, 5.15589559, 6.64992437, 2.63440202], [ 2.27733136, 9.29927394, 0.03718093, 7.7679183 ], [ 9.86934288, 7.59003904, 6.02363739, 2.78294206]] scaler = MinMaxScaler() scaler.fit(training_set) scaler.transform(training_set) [[ 0.49184811, 0. , 0.29704831, 0.15972182], [ 0.4943466 , 0.52384506, 1. , 0. ], [ 0. , 1. , 0. , 1. ], [ 1. , 0.80357559, 0.9052909 , 0.02893534]] 

Now I want to use the same scaler to normalize the test suite:

  [[ 8.31263467, 7.99782295, 0.02031658, 9.43249727], [ 1.03761228, 9.53173021, 5.99539478, 4.81456067], [ 0.19715961, 5.97702519, 0.53347403, 5.58747666], [ 9.67505429, 2.76225253, 7.39944931, 8.46746594]] 

But I do not want to use scaler.fit() all the time with training data. Is there a way to save the scaler and load it later from another file?

+24
source share
5 answers

So I'm not really an expert on this, but from a little research and a few useful links , I think pickle and sklearn.externals.joblib will become your friends here.

pickle allows you to save models or β€œdump” models to a file.

I think this link is also useful. This suggests the creation of a model of constancy. What you want to try is:

 # could use: import pickle... however let do something else from sklearn.externals import joblib # this is more efficient than pickle for things like large numpy arrays # ... which sklearn models often have. # then just 'dump' your file joblib.dump(clf, 'my_dope_model.pkl') 

Here you can learn more about the appearance of the sclear.

Let me know if this does not help, or if I don’t understand something in your model.

+16
source

Even better than pickle (which creates much larger files than this method), you can use the built-in sklearn tool:

 from sklearn.externals import joblib scaler_filename = "scaler.save" joblib.dump(scaler, scaler_filename) # And now to load... scaler = joblib.load(scaler_filename) 
+59
source

You can use pickle to keep scaling:

 import pickle scalerfile = 'scaler.sav' pickle.dump(scaler, open(scalerfile, 'wb')) 

Download it back:

 import pickle scalerfile = 'scaler.sav' scaler = pickle.load(open(scalerfile, 'rb')) test_scaled_set = scaler.transform(test_set) 
+8
source

Just note that sklearn.externals.joblib deprecated and replaced by the old joblib , which can be installed using pip install joblib :

 import joblib joblib.dump(my_scaler, 'scaler.pkl') my_scaler = joblib.load('scaler.pkl') 

Docs for joblib.dump() and joblib.load() .

+3
source

The best way to do this is to create an ML pipeline as follows:

 from sklearn.pipeline import make_pipeline from sklearn.preprocessing import MinMaxScaler from sklearn.externals import joblib pipeline = make_pipeline(MinMaxScaler(),YOUR_ML_MODEL() ) model = pipeline.fit(X_train, y_train) 

Now you can save it to a file:

 joblib.dump(model, 'filename.mod') 

Later you can download it like this:

 model = joblib.load('filename.mod') 
+2
source

Source: https://habr.com/ru/post/1014637/


All Articles