Vector way to request date and price data

Question:

Overview:

I am looking for a vectorized way to get a first date that sees a certain state. A condition is found when the price in dfDays is > target price specified in dfWeeks.target . This condition must be removed after the targetโ€™s date of installation.

Is there a way to do the following time series analysis, with apply or similarly, in a vectorized form in Pandas?


Data:

Create freq='D' test data file

 np.random.seed(seed=1) rng = pd.date_range('1/1/2000', '2000-07-31',freq='D') weeks = np.random.uniform(low=1.03, high=3, size=(len(rng),)) ts2 = pd.Series(weeks ,index=rng) dfDays = pd.DataFrame({'price':ts2}) 

Now create a remarketing freq='1W-Mon' dataframe

 dfWeeks = dfDays.resample('1W-Mon').first() dfWeeks['target'] = (dfWeeks['price'] + .5).round(2) 

Use reindex to align the index like on df:

 dfWeeks = dfWeeks.reindex(dfDays.index) 

So dfWeeks is a data frame containing the target values โ€‹โ€‹that we will use

  dfWeeks.dropna().head() price target 2000-01-03 1.851533 2.35 2000-01-10 1.625595 2.13 2000-01-17 1.855813 2.36 2000-01-24 2.130619 2.63 2000-01-31 2.756487 3.26 

If we focus on the first goal from dfWeeks

 match = dfDays[dfDays.price >= dfWeeks.target.loc['2000-01-03']] 

The first match in the past is so invalid, so entry 2000-01-12 is the first valid match:

 match.head() price 2000-01-02 2.449039 2000-01-12 2.379882 2000-01-14 2.759891 2000-01-16 2.350821 2000-01-21 2.607467 

enter image description here

Is there a way to do this using apply or similar to target entries in dfWeeks vectorized way?

Desired conclusion:

  price target target_hit 2000-01-03 1.851533 2.35 2000-01-12 2000-01-10 1.625595 2.13 2000-01-12 2000-01-17 1.855813 2.36 2000-01-21 2000-01-24 2.130619 2.63 2000-01-25 2000-01-31 2.756487 3.26 nan 
+5
source share
2 answers

pandas

 def find_match(x): match = dfDays.query('index > @x.name & price >= @x.target') if not match.empty: return match.index[0] dfWeeks.assign(target_hit=dfWeeks.apply(find_match, 1)) 

numpy

 timing = dfWeeks.index.values[:, None] < dfDays.index.values target_hit = dfWeeks.target.values[:, None] <= dfDays.price.values matches = timing & target_hit got_match = matches.any(1) first = matches.argmax(1)[got_match] dfWeeks.loc[got_match, 'target_hit'] = dfDays.index.values[first] dfWeeks 

both give

  price target target_hit 2000-01-03 1.851533 2.35 2000-01-12 2000-01-10 1.625595 2.13 2000-01-12 2000-01-17 1.855813 2.36 2000-01-21 2000-01-24 2.130619 2.63 2000-01-25 2000-01-31 2.756487 3.26 NaT 2000-02-07 1.859582 2.36 2000-02-09 2000-02-14 1.066028 1.57 2000-02-15 2000-02-21 1.912350 2.41 2000-03-09 2000-02-28 1.446907 1.95 2000-02-29 2000-03-06 2.408524 2.91 2000-03-28 2000-03-13 2.337675 2.84 2000-03-17 2000-03-20 2.620561 3.12 NaT 2000-03-27 2.770113 3.27 NaT 2000-04-03 2.930735 3.43 NaT 2000-04-10 1.834030 2.33 2000-04-12 2000-04-17 2.068304 2.57 2000-04-19 2000-04-24 2.391067 2.89 2000-05-11 2000-05-01 2.518262 3.02 NaT 2000-05-08 1.085764 1.59 2000-05-10 2000-05-15 1.579992 2.08 2000-05-16 2000-05-22 2.619997 3.12 NaT 2000-05-29 1.269047 1.77 2000-05-31 2000-06-05 1.171789 1.67 2000-06-06 2000-06-12 2.175277 2.68 2000-06-20 2000-06-19 1.338879 1.84 2000-06-20 2000-06-26 2.977574 3.48 NaT 2000-07-03 1.160680 1.66 2000-07-04 2000-07-10 2.615366 3.12 NaT 2000-07-17 2.478080 2.98 NaT 2000-07-24 2.899562 3.40 NaT 2000-07-31 2.220492 2.72 NaT 
+5
source

Just added time for requests. The current one really shines here.

Can anyone confirm the results by testing them on their computer.

 import pandas as pd import numpy as np np.random.seed(seed=1) rng = pd.date_range('1/1/2000', '2000-07-31',freq='D') weeks = np.random.uniform(low=1.03, high=3, size=(len(rng),)) ts2 = pd.Series(weeks ,index=rng) dfDays = pd.DataFrame({'price':ts2}) dfWeeks = dfDays.resample('1W-Mon').first() dfWeeks['target'] = (dfWeeks['price'] + .5).round(2) 

pandas

 %%timeit def find_match(x): match = dfDays.query('index > @x.name & price >= @x.target') if not match.empty: return match.index[0] dfWeeks.assign(target_hit=dfWeeks.apply(find_match, 1)) 

10 loops, best of 3: 66 ms per loop

Numpy

 %timeit timing = dfWeeks.index.values[:, None] < dfDays.index.values target_hit = dfWeeks.target.values[:, None] <= dfDays.price.values matches = timing & target_hit got_match = matches.any(1) first = matches.argmax(1)[got_match] dfWeeks.loc[got_match, 'target_hit'] = dfDays.index.values[first] dfWeeks 

The slowest run took 4.10 times longer than the fastest. It could, it could mean that the intermediate result is cached. 1000 loops, best of 3: 999 ยตs per loop

+1
source

Source: https://habr.com/ru/post/1263492/


All Articles