Can someone explain to me StandardScaler?

I can not understand the page StandardScaler in the documentation for sklearn.

Can anyone explain this to me in simple words?

+39
source share
8 answers

The idea StandardScaleris that it converts your data in such a way that its distribution will have an average value of 0 and a standard deviation of 1. Given the distribution of data, each value in the data set will be subtracted from the average value of the sample and then divided by the standard deviation of the entire set data.

+47
source

, / (mean = 0 standard deviation = 1) .

, ( ) // scikit-learn X / [number_of_samples, number_of_features]. . .

StandardScaler() ( X, !!!), // mean = 0 standard deviation = 1.


:

from sklearn.preprocessing import StandardScaler
import numpy as np

data = np.array([[0, 0], [1, 0], [0, 1], [1, 1]])
scaler = StandardScaler()
scaled_data = scaler.fit_transform(data)

print(data)
[[0 0]
 [0 0]
 [1 1]
 [1 1]]

print(scaled_data)
[[-1. -1.]
 [-1. -1.]
 [ 1.  1.]
 [ 1.  1.]]

, () 0:

scaled_data.mean(axis = 0)
array([0., 0.])

, () 1:

scaled_data.std(axis = 0)
array([1., 1.])

:

enter image description here

+34

StandardScaler . . , Employee AGE 20-70 SALARY 10000-80000.
, , .

+11

, , . . , , , 0.

+7

, , , . , . , . , .

import pandas as pd
import scipy.stats as ss
from sklearn.preprocessing import StandardScaler


data= [[1, 1, 1, 1, 1],[2, 5, 10, 50, 100],[3, 10, 20, 150, 200],[4, 15, 40, 200, 300]]

df = pd.DataFrame(data, columns=['N0', 'N1', 'N2', 'N3', 'N4']).astype('float64')

sc_X = StandardScaler()
df = sc_X.fit_transform(df)

num_cols = len(df[0,:])
for i in range(num_cols):
    col = df[:,i]
    col_stats = ss.describe(col)
    print(col_stats)

DescribeResult(nobs=4, minmax=(-1.3416407864998738, 1.3416407864998738), mean=0.0, variance=1.3333333333333333, skewness=0.0, kurtosis=-1.3599999999999999)
DescribeResult(nobs=4, minmax=(-1.2828087129930659, 1.3778315806221817), mean=-5.551115123125783e-17, variance=1.3333333333333337, skewness=0.11003776770595125, kurtosis=-1.394993095506219)
DescribeResult(nobs=4, minmax=(-1.155344148338584, 1.53471088361394), mean=0.0, variance=1.3333333333333333, skewness=0.48089217736510326, kurtosis=-1.1471008824318165)
DescribeResult(nobs=4, minmax=(-1.2604572012883055, 1.2668071116222517), mean=-5.551115123125783e-17, variance=1.3333333333333333, skewness=0.0056842140599118185, kurtosis=-1.6438177182479734)
DescribeResult(nobs=4, minmax=(-1.338945389819976, 1.3434309690153527), mean=5.551115123125783e-17, variance=1.3333333333333333, skewness=0.005374558840039456, kurtosis=-1.3619131970819205)
+5

StandardScaler() X 0 1.

.

: , (. sklearn).

+3

, , . .

>>>import numpy as np
>>>data = [[6, 2], [4, 2], [6, 4], [8, 2]]
>>>a = np.array(data)

>>>np.std(a, axis=0)
array([1.41421356, 0.8660254 ])

>>>np.mean(a, axis=0)
array([6. , 2.5])

>>>from sklearn.preprocessing import StandardScaler
>>>scaler = StandardScaler()
>>>scaler.fit(data)
>>>print(scaler.mean_)

#Xchanged = (X−μ)/σ  WHERE σ is Standard Deviation and μ is mean
>>>z=scaler.transform(data)
>>>z

, [6. , 2.5] [1.41421356, 0.8660254]

(0,1) - 2 = (2 - 2,5)/0,8660254 = -0.57735027

(1,0): 4 = (4-6)/1,41421356 = -1.414

enter image description here

enter image description here

: -2.77555756e -1 7 0.

  1. ?

  2. , sklearn StandardScaler,

+3

Source: https://habr.com/ru/post/1661681/


All Articles