Find the daily observation closest to a specific time for irregularly spaced data

I have a python data frame like

Out[110]:
Time
2014-09-19 21:59:14    55.975
2014-09-19 21:56:08    55.925
2014-09-19 21:53:05    55.950
2014-09-19 21:50:29    55.950
2014-09-19 21:50:03    55.925
2014-09-19 21:47:00    56.150
2014-09-19 21:53:57    56.225
2014-09-19 21:40:51    56.225
2014-09-19 21:37:50    56.300
2014-09-19 21:34:46    56.300
2014-09-19 21:31:41    56.350
2014-09-19 21:30:08    56.500
2014-09-19 21:28:39    56.375
2014-09-19 21:25:34    56.350
2014-09-19 21:22:32    56.400
2014-09-19 21:19:27    56.325
2014-09-19 21:16:25    56.325
2014-09-19 21:13:21    56.350
2014-09-19 21:10:18    56.425
2014-09-19 21:07:13    56.475
Name: Spread, dtype: float64

which spreads over long periods of time (from several months to several years), therefore with a very large number of observations for each day. What I want to do is that every day I want to get an observation of a time series closest to a specific time, for example, 4:00 p.m.

My approach so far has been

eodsearch = pd.DataFrame(df['Date'] + datetime.timedelta(hours=16))

eod = df.iloc[df.index.get_loc(eodsearch['Date'] ,method='nearest')]

which currently gives me an error

"Cannot convert input [Time Date, dtype: datetime64[ns]] of type <class 'pandas.core.series.Series'> to Timestamp 

Also, I saw that get_loc also accepted tolerances as input, so if I could set tolerance to say 30 minutes, that would be great.

Any tips on why my code is not working or how to fix it?

+4
1

:

from pandas.tseries.offsets import Hour

df.sort_index(inplace=True)  # Sort indices of original DF if not in sorted order
# Create a lookup dataframe whose index is offsetted by 16 hours
d = pd.DataFrame(dict(Time=pd.unique(df.index.date) + Hour(16)))

(i): reindex, : ( )

# Find values in original within +/- 30 minute interval of lookup 
df.reindex(d['Time'], method='nearest', tolerance=pd.Timedelta('30Min'))

enter image description here


(ii): merge_asof DF: ( )

# Find values in original within 30 minute interval of lookup (backwards)
pd.merge_asof(d, df.reset_index(), on='Time', tolerance=pd.Timedelta('30Min'))

enter image description here


(iii): +/- 30- :

Index.get_loc , .

DatetimeIndex.indexer_between_time, , start_time end_time . ( )


# Tolerance of +/- 30 minutes from 16:00:00
df.iloc[df.index.indexer_between_time("15:30:00", "16:30:00")]

enter image description here

, :

idx = pd.date_range('1/1/2017', periods=200, freq='20T', name='Time')
np.random.seed(42)
df = pd.DataFrame(dict(observation=np.random.uniform(50,60,200)), idx)
# Shuffle indices
df = df.sample(frac=1., random_state=42)

:

df.info()
<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 200 entries, 2017-01-02 07:40:00 to 2017-01-02 10:00:00
Data columns (total 1 columns):
observation    200 non-null float64
dtypes: float64(1)
memory usage: 3.1 KB
+2

Source: https://habr.com/ru/post/1669605/


All Articles