How to write code in vector form instead of using loops?

I would like to write the following code in vector form since the current code is rather slow (and would like to learn Python best practices). Basically, the code says that if today the value is within 10% of yesterday's value, then today the value (in the new column) coincides with the value yesterday. Otherwise, today the value does not change:

def test(df): df['OldCol']=(100,115,101,100,99,70,72,75,78,80,110) df['NewCol']=df['OldCol'] for i in range(1,len(df)-1): if df['OldCol'][i]/df['OldCol'][i-1]>0.9 and df['OldCol'][i]/df['OldCol'][i-1]<1.1: df['NewCol'][i]=df['NewCol'][i-1] else: df['NewCol'][i]=df['OldCol'][i] return df['NewCol'] 

The output should be as follows:

  OldCol NewCol 0 100 100 1 115 115 2 101 101 3 100 101 4 99 101 5 70 70 6 72 70 7 75 70 8 78 70 9 80 70 10 110 110 

Can you help?

I would like to use something like this, but I was not able to solve my problem:

 def test(df): df['NewCol']=df['OldCol'] cond=np.where((df['OldCol'].shift(1)/df['OldCol']>0.9) & (df['OldCol'].shift(1)/df['OldCol']<1.1)) df['NewCol'][cond[0]]=df['NewCol'][cond[0]-1] return df 
+5
source share
3 answers

Three-step solution:

 df['variation']=(df.OldCol/df.OldCol.shift()) df['gap']=~df.variation.between(0.9,1.1) df['NewCol']=df.OldCol.where(df.gap).fillna(method='ffill') 

For:

  OldCol variation gap NewCol 0 100 nan True 100 1 115 1.15 True 115 2 101 0.88 True 101 3 100 0.99 False 101 4 99 0.99 False 101 5 70 0.71 True 70 6 72 1.03 False 70 7 75 1.04 False 70 8 78 1.04 False 70 9 80 1.03 False 70 10 110 1.38 True 110 

It seems to be 30 times faster than the loops in this example.

In one line:

 x=df.OldCol;df['NewCol']=x.where(~(x/x.shift()).between(0.9,1.1)).fillna(method='ffill') 
+2
source

You must boolean disguise your original data framework:

df[(0.9 <= df['NewCol']/df['OldCol']) & (df['NewCol']/df['OldCol'] <= 1.1)] Show you all the lines where NewCol is within 10% off OldCol

So, to set the NewCol field in these lines:

 within_10 = df[(0.9 <= df['NewCol']/df['OldCol']) & (df['NewCol']/df['OldCol'] <= 1.1)] within_10['NewCol'] = within_10['OldCol'] 
0
source

Since you seem to find the days of the "jump" well, I will only show a more complex bit. So suppose you have a numpy array with old length N and an array of boolean numpy jump the same size. Typically, the null element from jump set to True . Then you can first calculate the number of repetitions between transitions:

 jump_indices = np.where(jumps)[0] repeats = np.diff(np.r_[jump_indices, [N]]) 

after that you can use np.repeat :

 new = np.repeat(old[jump_indices], repeats) 
0
source

Source: https://habr.com/ru/post/1263844/


All Articles