Pandas declarative style data processing

Question

Pandas declarative style data processing

I have a pandas tv channel of vehicle coordinates (from several vehicles in a few days). For each car and for every day I do two things: either apply the algorithm to it, or completely filter it from the data set if it does not meet certain criteria.

To do this, I use df.groupby('vehicle_id', 'day') , and then .apply(algorithm) or .filter(condition) , where algorithm and condition are the functions that are taken in the data frame.

I would like the full processing of my dataset (which includes several .apply and .filter ) that should be written in a declarative style, as opposed to an imperative loop through groups, with the goal of just looking at something like:

df.group_by('vehicle_id', 'day').apply(algorithm1).filter(condition1).apply(algorithm2).filter(condition2)

Of course, the code above is incorrect, because .apply() and .filter() returning new data, and this is just my problem. They return all the data back to one data frame, and I find that I use .groupby('vehicle_id', 'day') continuously.

Is there a good way that I can write this without having to group the same columns over and over?

+5

pandas dataframe declarative

mchristos Jun 28 '17 at 12:46

source share

1 answer

Shovalt · Answer 1 · 2018-01-14T12:48:29+0000

Since apply uses a for loop anyway (which means there are no complex optimizations in the background), I suggest using the actual for loop:

 arr = [] for key, dfg in df.groupby(['vehicle_id', 'day']): dfg = dfg.do_stuff1() # Perform all needed operations dfg = do_stuff2(dfg) # arr.append(dfg) result = pd.concat(arr)

An alternative is to create a function that runs everything that is applied and filters sequentially on a specific data frame, and then displays one group / applies to it:

 def all_operations(dfg): # Do stuff return result_df result = df.group_by(['vehicle_id', 'day']).apply(all_operations)

In both cases, you will have to deal with cases where an empty data filter is returned from the filters, if such cases exist.

Pandas declarative style data processing

More articles: