Replace outliers with column quantile in Pandas dataframe

I have a dataframe:

df = pd.DataFrame(np.random.randint(0,100,size=(5, 2)), columns=list('AB')) AB 0 92 65 1 61 97 2 17 39 3 70 47 4 56 6 

Here are 5% of the quantiles:

 down_quantiles = df.quantile(0.05) A 24.8 B 12.6 

And here is the mask for values ​​that are below the quantiles:

 outliers_low = (df < down_quantiles) AB 0 False False 1 False False 2 True False 3 False False 4 False True 

I want to set the values ​​in df lower than the quantile, in its quantitative column sign. I can do it like this:

 df[outliers_low] = np.nan df.fillna(down_quantiles, inplace=True) AB 0 92.0 65.0 1 61.0 97.0 2 24.8 39.0 3 70.0 47.0 4 56.0 12.6 

But, of course, there should be a more elegant way. How can I do this without fillna ? Thanks.

+6
source share
1 answer

You can use the DF.mask() method. Where a True instance is present, values ​​from another series are replaced according to the corresponding column names, providing axis=1 .

 df.mask(outliers_low, down_quantiles, axis=1) 

enter image description here


Another option would be to use the DF.where() method after inverting your logical mask using the tilde character ( ~ ).

 df.where(~outliers_low, down_quantiles, axis=1) 

enter image description here

+8
source

Source: https://habr.com/ru/post/1014248/


All Articles