I have a DataFrame with 1,500,000 rows. This is the one-minute stock market data that I bought from QuantQuote.com. (Open, High, Low, Close, Volume). I am trying to launch some homemade backtests of stock trading strategies. The direct python for transaction processing is too slow, and I wanted to try using numba to speed things up. The problem is that numba does not work with pandas functions .
Google searches find an amazing lack of information on using numba with pandas. Which makes me wonder if I am mistaken when considering this.
My setup is Numba 0.13.0-1, pandas 0.13.1-1. Windows 7, MS VS2013 with PTVS, Python 2.7, Enthought Canopy
My existing Python + Pandas innerloop has the following general structure
- Compute the indicator columns (using pd.ewma, pd.rolling_max, pd.rolling_min, etc.).
- Compute “event” columns for predefined events, such as moving averages of crosses, new highs, etc.
Then I use DataFrame.iterrows to handle the DataFrame.
I tried various optimizations, but still not as fast as I would like. And optimization causes errors.
I want to use numba to handle strings. Are there any preferred methods to approximate this?
DataFrame , - DataFrame.values , , numba . , , . , , DataFrame.values, .
.