Best way to subset pandas dataframe

Hey, I'm new to Pandas, and I just stumbled upon df.query().

Why will people use df.query()it when you can filter their Dataframes directly with parentheses? The official Pandas study guide also seems to prefer the latter approach.

With parentheses:

df[df['age'] <= 21]

With the Pandas request method:

df.query('age <= 21')

Besides some of the mentioned stylistic or flexible differences, is it canonically preferable, namely for performing operations on large data frames?

+4
source share
2 answers

Consider the following DF example:

In [307]: df
Out[307]:
  sex  age     name
0   M   40      Max
1   F   35     Anna
2   M   29      Joe
3   F   18    Maria
4   F   23  Natalie

There are many good reasons to prefer the method .query().

  • :

    In [308]: df.query("20 <= age <= 30 and sex=='F'")
    Out[308]:
      sex  age     name
    4   F   23  Natalie
    
    In [309]: df[(df['age']>=20) & (df['age']<=30) & (df['sex']=='F')]
    Out[309]:
      sex  age     name
    4   F   23  Natalie
    
  • ():

    In [315]: conditions = {'name':'Joe', 'sex':'M'}
    
    In [316]: q = ' and '.join(['{}=="{}"'.format(k,v) for k,v in conditions.items()])
    
    In [317]: q
    Out[317]: 'name=="Joe" and sex=="M"'
    
    In [318]: df.query(q)
    Out[318]:
      sex  age name
    2   M   29  Joe
    

PS :

  • .query() , ,
  • , engine='python' engine='numexpr' ( )

. ( Pandas Pandas) :

, .query - , , , , .

+4

.

case () , DataFrame , ( /) common.. , - ()

:

dfA = pd.DataFrame([[1,2,3], [4,5,6]], columns=["X", "Y", "Z"])
dfB = pd.DataFrame([[1,3,3], [4,1,6]], columns=["X", "Y", "Z"])
q = "(X > 3) & (Y < 10)"

print(dfA.query(q))
print(dfB.query(q))

   X  Y  Z
1  4  5  6
   X  Y  Z
1  4  1  6

df.query('a < b and b < c')  # understand a bit more English

in not in ( isin)

df.query('a in [3, 4, 5]') # select rows whose value of column a is in [2, 3, 4]

== != ( in/not in)

df.query('a == [1, 3, 5]') # select whose value of column a is in [1, 3, 5]
# equivalent to df.query('a in [1, 3, 5]')
+1

Source: https://habr.com/ru/post/1692503/


All Articles