Python pandas: best way to normalize data?

Question

Python pandas: best way to normalize data?

I have a large pandas framework with approximately 80 columns. Each of the 80 columns in the dataframe reports daily traffic statistics for websites (columns are websites).

Since I don’t want to work with raw traffic statistics, I prefer to normalize all my columns (except the first, which is the date). Either from 0 to 1, or (even better) from 0 to 100.

Date AB ... 10/10/2010 100.0 402.0 ... 11/10/2010 250.0 800.0 ... 12/10/2010 800.0 2000.0 ... 13/10/2010 400.0 1800.0 ...

Saying, I wonder what normalization is applied. Min-Max scaling versus z-Score normalization (standardization)? Some of my columns have strong outliers. It would be great to have an example. I regret that I can not provide complete data.

+6

python-3.x pandas normalization

Rnaldinho Oct 22 '16 at 21:18

source share

1 answer

User191919 · Answer 1 · 2016-10-22T21:45:12+0000

First rotate the Date column to the index.

 dates = df.pop('Date') df.index = dates

Then either use z-score normalization:

 df1 = (df - df.mean())/df.std()

or scaling min-max:

 df2 = (df-df.min())/(df.max()-df.min())

I would advise normalizing the z-score normalization, because min-max scaling is highly prone to outliers.

Python pandas: best way to normalize data?

More articles: