What replace loops and nested if sentences to speed up Python code?

Question

What replace loops and nested if sentences to speed up Python code?

How can I avoid for loops and nested sentences and be more Pythonic?

At first glance, this may seem like “ask me all my work for me.” I can assure you that this is not so. I am trying to learn some real Python and would like to find ways to speed up the code based on a reproducible example and a predefined function.

I calculate the return from the following specific signals in the financial markets, using loads for cycles and nested if offers. I have made several attempts, but I just do not get anywhere with vectorization or understanding or other more pythonic trading tools. I have been fine with this so far, but finally I am starting to feel the pain of using functions that are too slow in scale.

I have a dataframe with two indexes and one specific event. The first two code snippets are included to show the procedure step by step. I included the whole thing with some predefined settings and a function at the very end.

IN 1]

# Settings
import numpy as np
import pandas as pd
import datetime
np.random.seed(12345678)

Observations = 10

# Data frame values:
# Two indicators with values betwwen 0 and 10
# and one Event which does or does not occur with values 0 or 1
df = pd.DataFrame(np.random.randint(0,10,size=(Observations, 2)),
                  columns=['IndicatorA', 'IndicatorB'] )
df['Event'] = np.random.randint(0,2,size=(Observations, 1))

# Data frame index:
datelist = pd.date_range(pd.datetime.today().strftime('%Y-%m-%d'),
                         periods=Observations).tolist()
df['Dates'] = datelist
df = df.set_index(['Dates'])    

# Placeholder for signals based on the existing values
# in the data frame
df['Signal'] = 0

print(df)

Out [1]

A data frame is indexed by date. The signal I'm looking for is determined by the interaction of these indicators and events. The signal is calculated as follows (extension by fragment above):

IN 2]

i = 0
for signals in df['Signal']:
    if i == 0: 
        # First signal is always zero
        df.ix[i,'Signal'] = 0
    else:
        # Signal is 1 if Indicator A is above a certain level
        if df.ix[i,'IndicatorA'] > 5:                
            df.ix[i,'Signal'] = 1
        else:
            # Signal is 1 if Indicator B is above a certain level
            # AND a certain event occurs                
            if df.ix[i - 1,'IndicatorB'] > 5 & df.ix[i,'Event'] > 1:
                 df.ix[i,'Signal'] = 1
            else:
                df.ix[i,'Signal'] = 0          
    i = i + 1    

print(df['Signal'])

Out [2]

, . , Signal. , , , % time ipython.

# Settings
import numpy as np
import pandas as pd
import datetime

# The whole thing defined as a function

def fxSlow(Observations):

    np.random.seed(12345678)

    df = pd.DataFrame(np.random.randint(0,10,size=(Observations, 2)),
                        columns=['IndicatorA', 'IndicatorB'] )
    df['Event'] = np.random.randint(0,2,size=(Observations, 1))

    datelist = pd.date_range(pd.datetime.today().strftime('%Y-%m-%d'),
                periods=Observations).tolist()
    df['Signal'] = 0

    df['Dates'] = datelist
    df = df.set_index(['Dates'])

    i = 0
    for signals in df['Signal']:
        if i == 0: 
            # First signal is always zero
            df.ix[i,'Signal'] = 0
        else:
            # Signal is 1 if Indocator A is above a certain level
            if df.ix[i,'IndicatorA'] > 5:                
                df.ix[i,'Signal'] = 1
            else:
                # Signal is 1 if Indicator B is above a certain level
                # AND a certain event occurs                
                if df.ix[i - 1,'IndicatorB'] > 5 & df.ix[i,'Event'] > 1:
                     df.ix[i,'Signal'] = 1
                else:
                    df.ix[i,'Signal'] = 0          
        i = i + 1    


    return np.mean(df['Signal'])

/ :

, , Pythonic?

, , 100000?

0

python vectorization pandas list-comprehension ipython

vestland 07 . '17 12:25

1

Scott Boston · Accepted Answer · 2017-06-07T13:14:17+0000

- ?

def fxSlow2(Observations):

    np.random.seed(12345678)

    df = pd.DataFrame(np.random.randint(0,10,size=(Observations, 2)),
                        columns=['IndicatorA', 'IndicatorB'] )
    df['Event'] = np.random.randint(0,2,size=(Observations, 1))

    datelist = pd.date_range(pd.datetime.today().strftime('%Y-%m-%d'),
                periods=Observations).tolist()
    df['Signal'] = 0

    df['Dates'] = datelist
    df = df.set_index(['Dates'])

    df['Signal'] = (np.where(df.IndicatorA > 5, 
          1, 
          np.where( (df.shift(-1).IndicatorB > 5) &(df.Event > 1), 
                    1, 
                    0)
          )
    )

    df.loc[df.index[0],'Signal'] = 0

    return np.mean(df['Signal'])

% fxSlow2 (100)

: 10

Out [208]: 0,43

% time fxSlow2 (1000)

: 15

Out [209]: 0.414

% fxSlow2 (10000)

: 61

Out [210]: 0.4058

What replace loops and nested if sentences to speed up Python code?

More articles: