I am looking for the fastest way to do the following:
We have pd.DataFrame:
df = pd.DataFrame({ 'High': [1.3,1.2,1.1], 'Low': [1.3,1.2,1.1], 'High1': [1.1, 1.1, 1.1], 'High2': [1.2, 1.2, 1.2], 'High3': [1.3, 1.3, 1.3], 'Low1': [1.3, 1.3, 1.3], 'Low2': [1.2, 1.2, 1.2], 'Low3': [1.1, 1.1, 1.1]})
It looks like this:
In [4]: df Out[4]: High High1 High2 High3 Low Low1 Low2 Low3 0 1.3 1.1 1.2 1.3 1.3 1.3 1.2 1.1 1 1.2 1.1 1.2 1.3 1.2 1.3 1.2 1.1 2 1.1 1.1 1.2 1.3 1.1 1.3 1.2 1.1
What I want to know is that one of the values ββfloat High1, High2, High3 is the first that is greater than or equal to High. If not, it should be np.nan
And the same for Low1, Low2, Low3 value, but in this case one of them is the first one that is less than or equal to High. If not, it should be np.nan
In the end, I need to know which one, Low or High, was the first.
One way to solve this problem is a strange and not very effective way:
df['LowIs'] = np.nan df['HighIs'] = np.nan for i in range(1,4): df['LowIs'] = np.where((np.isnan(df['LowIs'])) & ( df['Low'] >= df['Low'+str(i)]), i, df['LowIs']) df['HighIs'] = np.where((np.isnan(df['HighIs'])) & ( df['High'] <= df['High'+str(i)]), i, df['HighIs']) df['IsFirst'] = np.where( df.LowIs < df.HighIs, 'Low', np.where(df.LowIs > df.HighIs, 'High', 'None') )
What gives me:
In [8]: df Out[8]: High High1 High2 High3 Low Low1 Low2 Low3 LowIs HighIs IsFirst 0 1.3 1.1 1.2 1.3 1.3 1.3 1.2 1.1 1.0 3.0 Low 1 1.2 1.1 1.2 1.3 1.2 1.3 1.2 1.1 2.0 2.0 None 2 1.1 1.1 1.2 1.3 1.1 1.3 1.2 1.1 3.0 1.0 High
How should I do this over and over again in many iterations where High / Low will be different, performance is key.
So, I would not mind if High1, High2, High3 and Low1, Low2, Low3 will be in a separate DataFrame, which will be transposed, or if it will be in a dict or something else. Thus, the process of preparing data in all cases providing the best performance can be slow and inconvenient.
One of the solutions that I was working on, but just couldn't finish the job in vector form, and that also seems pretty slow:
df.loc[(df.index == 0), 'HighIs'] = np.where( df.loc[(df.index == 0), ['High1', 'High2', 'High3']] >= 1.3 )[1][0] + 1
So, check which of the columns it is true in this first row, and then look at the index number np.where ().
We are looking forward to offers and hope to learn something new! :)