Are you sure you need to find a faster method? Your current method is not so slow. The following changes may be simplified, but not necessarily faster:
Step 1: Take a random sample (with replacement) from the list of data files
rand_stocks = np.random.randint(0, len(data), size=batch_size)
You can think of this array of rand_stocks as a list of indexes that will be applied to your data series. The size is already the size of the batch, which eliminates the need for a while loop and your comparison on line 156.
That is, you can rand_stocks over rand_stocks and access Scott as follows:
for idx in rand_stocks: stock = data.ix[idx]
Step 2: Get a random datarange for each stock that you randomly selected.
start_idx = np.random.randint(offset, len(stock)-timesteps) d = data_t[start_idx:start_idx+timesteps]
I do not have your data, but here is how I add them:
def random_sample(data=None, timesteps=100, batch_size=100, subset='train'): if subset=='train': offset = 0
Creating a dataset:
In [22]: import numpy as np In [23]: import pandas as pd In [24]: rndrange = pd.DateRange('1/1/2012', periods=72, freq='H') In [25]: rndseries = pd.Series(np.random.randn(len(rndrange)), index=rndrange) In [26]: rndseries.head() Out[26]: 2012-01-02 2.025795 2012-01-03 1.731667 2012-01-04 0.092725 2012-01-05 -0.489804 2012-01-06 -0.090041 In [27]: data = [rndseries,rndseries,rndseries,rndseries,rndseries,rndseries]
Function Testing:
In [42]: random_sample(data, timesteps=2, batch_size = 2) Out[42]: [2012-01-23 1.464576 2012-01-24 -1.052048, 2012-01-23 1.464576 2012-01-24 -1.052048]