My sample code is as follows:
import pandas as pd
dictx = {'col1':[1,'nan','nan','nan',5,'nan',7,'nan',9,'nan','nan','nan',13],\
'col2':[20,'nan','nan','nan',22,'nan',25,'nan',30,'nan','nan','nan',25],\
'col3':[15,'nan','nan','nan',10,'nan',14,'nan',13,'nan','nan','nan',9]}
df = pd.DataFrame(dictx).astype(float)
I am trying to interpolate various segments that contain the value "nan".
For context, I'm trying to track bus speeds using GPS data provided by the city (Sรฃo Paulo, Brazil), but there is little data and with parts that do not provide information, for example, but there are segments that I know they stop, for example, dawn, but the information also comes in as "nan."
What I need:
I experimented with the parameters dataframe.interpolate()(limit and limit_diretcion), but it turned out to be short. If I install df.interpolate(limit=2), I will not only interpolate the data I need, but also the data in which they should not. So I need to interpolate between sections defined by the limit
Required Conclusion:
Out[7]:
col1 col2 col3
0 1.0 20.00 15.00
1 nan nan nan
2 nan nan nan
3 nan nan nan
4 5.0 22.00 10.00
5 6.0 23.50 12.00
6 7.0 25.00 14.00
7 8.0 27.50 13.50
8 9.0 30.00 13.00
9 nan nan nan
10 nan nan nan
11 nan nan nan
12 13.0 25.00 9.00
The logic I'm trying to apply is basically trying to find nan and calculate the difference between their indices and thereby create a new dataframe_temp for interpolation and only add it to another, creating a new dataframe_final. But this has become difficult to achieve because 'nan'=='nan'returnFalse