Index Search for Value in Pandas Dataframe

I have a problem that should not be so difficult, but it pounds me. There should be an easy way to do this. I have a series from a data frame that looks like this:

value 2001-01-04 0.134 2001-01-05 Nan 2001-01-06 Nan 2001-01-07 0.032 2001-01-08 Nan 2001-01-09 0.113 2001-01-10 Nan 2001-01-11 Nan 2001-01-12 0.112 2001-01-13 Nan 2001-01-14 Nan 2001-01-15 0.136 2001-01-16 Nan 2001-01-17 Nan 

Iterating from bottom to top, I need to find the index of a value greater than 0.100 at the earliest date when the next earliest date is less than 0.100.

So, in the above series, I want to find the index of the value 0.113, which is 2001-01-09. The next earlier value is below 0.100 (0.031 in 2001-01-07). Two later values โ€‹โ€‹are greater than 0.100, but I want the index of the earliest value> 0.100 after the value is less than the threshold, iterating from bottom to top.

The only way I can do this is to reverse the series, iterate to the first (last) value, check if it is> 0.100, and then repeat the next earlier value again and check it to see than 0.100. If it's not me. If it is> 0.100, I need to repeat again and check the earlier number.

Undoubtedly, there is a non-dirty way to do this, I do not see to avoid this stepwise iteration.

Thanks in advance for your help.

+5
source share
2 answers

Basically you are looking for two conditions. For the first condition, you want the setpoint to be greater than 0.1:

 df['value'].gt(0.1) 

For the second condition, you want the previous nonzero value to be less than 0.1:

 df['value'].ffill().shift().lt(0.1) 

Now, combine the two conditions with the operator and, change the resulting Boolean indexer and use idxmax to find the first (last) instance where your condition is satisfied:

 (df['value'].gt(0.1) & df['value'].ffill().shift().lt(0.1))[::-1].idxmax() 

Which gives the expected index value.

The above method assumes that at least one value satisfies the situation you described. If it is possible that your data may not suit your situation, you can use any to make sure that a solution exists:

 # Build the condition. cond = (df['value'].gt(0.1) & df['value'].ffill().shift().lt(0.1))[::-1] # Check if the condition is met anywhere. if cond.any(): idx = cond.idxmax() else: idx = ??? 

In your question, you indicated that both inequalities are strict. What happens with a value of 0.1? You can change one of gt / lt to ge / le to account for this.

+6
source

Bookkeepping

 # making sure `nan` are actually `nan` df.value = pd.to_numeric(df.value, 'coerce') # making sure strings are actually dates df.index = pd.to_datetime(df.index) 

plan

  • dropna
  • sort_index
  • logical series less than 0.1
  • convert to integers for use in diff
  • diff - Your script happens when we go from < .1 to > .1 . In this case, diff will be -1
  • idxmax - find the first -1

 df.value.dropna().sort_index().lt(.1).astype(int).diff().eq(-1).idxmax() 2001-01-09 00:00:00 

The correction takes into account the flaws noted by @root.

 diffs = df.value.dropna().sort_index().lt(.1).astype(int).diff().eq(-1) diffs.idxmax() if diffs.any() else pd.NaT 

editorial

This question highlights the important dynamics of SO. We, who often answer questions, do this by editing our questions until they are in satisfactory condition. I noticed that those of us who answer pandas questions are generally very helpful to each other, as well as those who ask questions.

In this post, I was knowledgeable @root and subsequently changed my post to reflect the information added. This in itself makes the @root message very useful in addition to the other great information they provided.

Please find out both the messages and the number of votes you can.

thanks

+4
source

Source: https://habr.com/ru/post/1265885/


All Articles