How to find values below (or above) the average

Question

How to find values below (or above) the average

As you can see from the following summary report, the counter for September 1 (1542677) is below the monthly average.

from StringIO import StringIO

myst="""01/01/2016  8781262
01/02/2016  8958598
01/03/2016  8787628
01/04/2016  9770861
01/05/2016  8409410
01/06/2016  8924784
01/07/2016  8597500
01/08/2016  6436862
01/09/2016  1542677
"""
u_cols=['month', 'count']

myf = StringIO(myst)
import pandas as pd
df = pd.read_csv(StringIO(myst), sep='\t', names = u_cols)

Is there a mathematical formula that can define this “road lower or too high” (ambiguous) concept?

This is easy if I define a limit (e.g. 9 or 10%). But I want the script to solve this for me and return the values if the difference between the lowest and second last lowest value is more than just 5%. In this case, you need to return the account in September.

+4

python pandas dataframe

shantanuo Oct 10 '16 at 4:09

source share

3 answers

, " ", , Outlier ( ),

, ; , , .

:

, , , , , .

, , , , .

, , , , (. this , python).

. ( ) , "" (. postoverflow post python).

, 0% , . , ( ) - , . . , 2013 , , ,

, , . , , . , , Python , script (, Google).

, , :

: , , , ( ), , , , , .

: , , , : . " ", , , , , , - . ML, Andrew Ng .

, !

+2

fr_andres 10 . '16 5:18

(IQR, wikipedia), 75% (Q3) 25% (Q1).

Emissions are determined if data falls below Q1 - k * IQR, respectively. higher than Q3 + k * IQR.

You can choose the constant k based on the knowledge of your domain (the general choice is 1.5).

Given the data, a filter in pandas might look like this:

iqr_filter = pd.DataFrame(df["count"].quantile([0.25, 0.75])).T
iqr_filter["iqr"] = iqr_filter[0.75]-iqr_filter[0.25]
iqr_filter["lo"] = iqr_filter[0.25] - 1.5*iqr_filter["iqr"]
iqr_filter["up"] = iqr_filter[0.75] + 1.5*iqr_filter["iqr"]
df_filtered = df.loc[(df["count"] > iqr_filter["lo"][0]) & (df["count"] < iqr_filter["up"][0]), :]

+2

bn2302 Oct 10 '16 at 5:42

source share

piRSquared · Accepted Answer · 2016-10-10T05:15:02+0000

. a zscore, , . , 2 . 5% .

zscore

def zscore(s):
    return (s - np.mean(s)) / np.std(s)

count

zscore(df['count'])

0    0.414005
1    0.488906
2    0.416694
3    0.831981
4    0.256946
5    0.474624
6    0.336390
7   -0.576197
8   -2.643349
Name: count, dtype: float64

, 2,6 .

abs gt

zscore(df['count']).abs().gt(2)

0    False
1    False
2    False
3    False
4    False
5    False
6    False
7    False
8     True
Name: count, dtype: bool

, .

,

df[zscore(df['count']).abs().gt(2)]

df[zscore(df['count']).abs().le(2)]

How to find values ​​below (or above) the average

More articles:

How to find values below (or above) the average