Data cleansing that requires iterating over pandas.DataFrame 3 rows at a time

Question

Data cleansing that requires iterating over pandas.DataFrame 3 rows at a time

I have several large data sets with sensor readings, where sometimes the line will be 0. Heuristic is quite simple: if the previous line and the next line were not 0, I assume that this is a sensor failure, and I replace this line with the middle of the two around him.

There are legitimate cases where the readings of the sensors can be 0, so just looking at 0s is not an option.

so far I have come up with the following method of cleaning it:

data["x+1"] = data["x"].shift(1)
data["x+2"] = data["x"].shift(2)

res = data[["x", "x+1", "x+2"]].apply( 
  lambda x : (x[0] + x[2])/2 
             if ((x[0] > 0) and (x[1] <= 0) and (x[2] > 0) ) 
             else x[1], axis=1 )

data[x] = res.shift(-1)

This works in principle, and I prefer it to iterate over 3 shifted and shifted data frames as follows:

for row1, row2, row3 in zip( data.iterrows(), data.shift(1).iterrows(), data.shift(2).iterrows() ):
       ...

, . , apply ().

, :

data.loc[ data["x"] == 0 , "x" ] = np.NaN
data["x"].fillna( method="ffill", limit=1, inplace=True)
data["x"].fillna( 0 )

, , , ( NaN, , , NaN)

, , . awk , , python .

.

+4

python pandas nan moving-average sliding-window

MB. 31 . '16 2:40

1

Psidom · Accepted Answer · 2016-12-31T02:50:24+0000

where:

preV = data['x'].shift(1)
nexT = data['x'].shift(-1)
data['x'] = data['x'].where((data['x'] > 0) | (preV <= 0) | (nexT <= 0), (preV + nexT)/2)

:

data = pd.DataFrame({"x": [1,2,3,0,0,2,3,0,4,2,0,0,0,1]})

:

0     1.0
1     2.0
2     3.0
3     0.0
4     0.0
5     2.0
6     3.0
7     3.5              # 0 gets replaced here
8     4.0
9     2.0
10    0.0
11    0.0
12    0.0
13    1.0
Name: x, dtype: float64

, , :

data.loc[(data['x'] <= 0) & (preV > 0) & (nexT > 0), "x"] = (preV + nexT)/2

Data cleansing that requires iterating over pandas.DataFrame 3 rows at a time

More articles: