Pandas syntax for dataframe.query method

Question:

I would like to better understand the Pandas DataFrame.query method and what the following expression represents:

match = dfDays.query('index > @x.name & price >= @x.target') 

What does @x.name ?

I understand what the final result is for this code (a new column with pandas.tslib.Timestamp data), but does not have a clear understanding of the expression used to get this final result.

Data:

From here:

A vectorized way to request dates and prices

 np.random.seed(seed=1) rng = pd.date_range('1/1/2000', '2000-07-31',freq='D') weeks = np.random.uniform(low=1.03, high=3, size=(len(rng),)) ts2 = pd.Series(weeks ,index=rng) dfDays = pd.DataFrame({'price':ts2}) dfWeeks = dfDays.resample('1W-Mon').first() dfWeeks['target'] = (dfWeeks['price'] + .5).round(2) def find_match(x): match = dfDays.query('index > @x.name & price >= @x.target') if not match.empty: return match.index[0] dfWeeks.assign(target_hit=dfWeeks.apply(find_match, 1)) 
+7
source share
2 answers

Everything @MaxU said is great!

I wanted to add some context to the specific problem to which it was applied.

find_match

This is a helper function that is used in the dataframe dfWeeks.apply . Two things to note:

  • find_match takes a single argument x . This will be one line of dfWeeks .
    • Each row is a pd.Series object, and each row will be passed through this function. This is the nature of use apply .
    • When apply passes this row to a helper function, the row has a name attribute, which is equal to the index value for this row in the data frame. In this case, I know that the index value is pd.Timestamp , and I will use it to compare which I need to do.
  • find_match links to dfDays that are not in the scope of find_match .

I did not need to use query ... I like to use query . In my opinion, this makes the code more beautiful. The following function provided by OP could have been written differently.

 def find_match(x): """Original""" match = dfDays.query('index > @x.name & price >= @x.target') if not match.empty: return match.index[0] dfWeeks.assign(target_hit=dfWeeks.apply(find_match, 1)) 

find_match_alt

Or we could do this, which might help explain what the query string does above.

 def find_match_alt(x): """Alternative to OP's""" date_is_afterwards = dfDays.index > x.name price_target_is_met = dfDays.price >= x.target both_are_true = price_target_is_met & date_is_afterwards if (both_are_true).any(): return dfDays[both_are_true].index[0] dfWeeks.assign(target_hit=dfWeeks.apply(find_match_alt, 1)) 

A comparison of these two functions should give a good perspective.

+5
source

@x.name - @ helps .query() understand that x is an external object (does not belong to the DataFrame for which the query () method was called). In this case, x is a DataFrame. It can also be a scalar value.

I hope this little demo helps you understand this:

 In [79]: d1 Out[79]: abc 0 1 2 3 1 4 5 6 2 7 8 9 In [80]: d2 Out[80]: ax 0 1 10 1 7 11 In [81]: d1.query("a in @d2.a") Out[81]: abc 0 1 2 3 2 7 8 9 In [82]: d1.query("c < @d2.a") Out[82]: abc 1 4 5 6 

Scalar x :

 In [83]: x = 9 In [84]: d1.query("c == @x") Out[84]: abc 2 7 8 9 
+8
source

Source: https://habr.com/ru/post/1263493/


All Articles