How do you filter pandas numeric frames across multiple columns

To filter a data frame (df) by one column, if we look at data with males and females, we can:

males = df[df[Gender]=='Male'] 

Question 1 - But what if the data spans several years, and I wanted to see only men for 2014?

In other languages, I can do something like:

 if A = "Male" and if B = "2014" then 

(except that I want to do this and get a subset of the original data frame in the new dataframe)

Question 2. How to do this in a loop and create a dataframe for each unique set of year and gender (for example, df for: 2013-Male, 2013-Female, 2014-Male and 2014-Female

 for y in year: for g in gender: df = ..... 
+78
python filter pandas
Feb 28 '14 at 4:21
source share
4 answers

Using the & operator, remember to wrap the subqueries with () :

 males = df[(df[Gender]=='Male') & (df[Year]==2014)] 

To save your data in a dict using a for loop:

 from collections import defaultdict dic={} for g in ['male', 'female']: dic[g]=defaultdict(dict) for y in [2013, 2014]: dic[g][y]=df[(df[Gender]==g) & (df[Year]==y)] #store the DataFrames to a dict of dict 

EDIT:

Demo for your getDF :

 def getDF(dic, gender, year): return dic[gender][year] print genDF(dic, 'male', 2014) 
+128
Feb 28 '14 at 4:40
source share

For more general logical functions that you would like to use as a filter and that depend on more than one column, you can use:

 df = df[df[['col_1','col_2']].apply(lambda x: f(*x), axis=1)] 

where f is a function that applies to each pair of elements (x1, x2) from col_1 and col_2 and returns True or False depending on any condition in which you want (x1, x2).

+20
Oct 02 '16 at 18:37
source share

Start with panda 0.13 , this is the most efficient way.

 df.query('Gender=="Male" & Year=="2014" ') 
+2
Mar 22 '19 at 6:03
source share

You can filter by multiple columns (more than two) using the np.logical_and operator to replace & (or np.logical_or to replace | )

Here is an example of a function that does this work if you provide target values ​​for multiple fields. You can adapt it for different types of filtering and the like:

 def filter_df(df, filter_values): """Filter df by matching targets for multiple columns. Args: df (pd.DataFrame): dataframe filter_values (None or dict): Dictionary of the form: '{<field>: <target_values_list>}' used to filter columns data. """ import numpy as np if filter_values is None or not filter_values: return df return df[ np.logical_and.reduce([ df[column].isin(target_values) for column, target_values in filter_values.items() ]) ] 

Using:

 df = pd.DataFrame({'a': [1, 2, 3, 4], 'b': [1, 2, 3, 4]}) filter_df(df, { 'a': [1, 2, 3], 'b': [1, 2, 4] }) 
-one
Sep 06 '19 at 13:26
source share



All Articles