Best way to split DataFrame by edge

Question

Best way to split DataFrame by edge

Suppose I have the following DataFrame:

ab 0 A 1.516733 1 A 0.035646 2 A -0.942834 3 B -0.157334 4 A 2.226809 5 A 0.768516 6 B -0.015162 7 A 0.710356 8 A 0.151429

And I need to group it according to "edge B"; this means that the groups will:

  ab 0 A 1.516733 1 A 0.035646 2 A -0.942834 3 B -0.157334 4 A 2.226809 5 A 0.768516 6 B -0.015162 7 A 0.710356 8 A 0.151429

I.e. every time I find “B” in column “a”, I want to split my DataFrame.

My current solution:

 #create the dataframe s = pd.Series(['A','A','A','B','A','A','B','A','A']) ss = pd.Series(np.random.randn(9)) dff = pd.DataFrame({"a":s,"b":ss}) #my solution count = 0 ls = [] for i in s: if i=="A": ls.append(count) else: ls.append(count) count+=1 dff['grpb']=ls

and I got a dataframe:

  ab grpb 0 A 1.516733 0 1 A 0.035646 0 2 A -0.942834 0 3 B -0.157334 0 4 A 2.226809 1 5 A 0.768516 1 6 B -0.015162 1 7 A 0.710356 2 8 A 0.151429 2

What can I break with dff.groupby('grpb') .

Is there a more efficient way to do this using pandas functions?

+4

python pandas

javier Nov 12 '12 at 23:19

source share

4 answers

user2177091 · Answer 1 · 2013-03-16T13:43:42+0000

here: oneliner:

 zip(*dff.groupby(pd.rolling_median((1*(dff['a']=='B')).cumsum(),3,True)))[-1] [ 1 2 0 A 1.516733 1 A 0.035646 2 A -0.942834 3 B -0.157334, 1 2 4 A 2.226809 5 A 0.768516 6 B -0.015162, 1 2 7 A 0.710356 8 A 0.151429]

DSM · Answer 2 · 2013-03-16T14:28:43+0000

What about:

 df.groupby((df.a == "B").shift(1).fillna(0).cumsum())

For instance:

 >>> df ab 0 A -1.957118 1 A -0.906079 2 A -0.496355 3 B 0.552072 4 A -1.903361 5 A 1.436268 6 B 0.391087 7 A -0.907679 8 A 1.672897 >>> gg = list(df.groupby((df.a == "B").shift(1).fillna(0).cumsum())) >>> pprint.pprint(gg) [(0, ab 0 A -1.957118 1 A -0.906079 2 A -0.496355 3 B 0.552072), (1, ab 4 A -1.903361 5 A 1.436268 6 B 0.391087), (2, ab 7 A -0.907679 8 A 1.672897)]

(I did not get rid of indexes, you could use [g for k, g in df.groupby(...)] if you want.)

Wouter overmeire · Answer 3 · 2012-11-13T09:21:03+0000

Alternative is

 In [36]: dff Out[36]: ab 0 A 0.689785 1 A -0.374623 2 A 0.517337 3 B 1.549259 4 A 0.576892 5 A -0.833309 6 B -0.209827 7 A -0.150917 8 A -1.296696 In [37]: dff['grpb'] = np.NaN In [38]: breaks = dff[dff.a == 'B'].index In [39]: dff['grpb'][breaks] = range(len(breaks)) In [40]: dff.fillna(method='bfill').fillna(len(breaks)) Out[40]: ab grpb 0 A 0.689785 0 1 A -0.374623 0 2 A 0.517337 0 3 B 1.549259 0 4 A 0.576892 1 5 A -0.833309 1 6 B -0.209827 1 7 A -0.150917 2 8 A -1.296696 2

Or using itertools to create 'grpb' is also an option.

Jerry T · Answer 4 · 2013-04-28T01:23:29+0000

  def vGroup(dataFrame, edgeCondition, groupName='autoGroup'): groupNum = 0 dataFrame[groupName] = '' #loop over each row for inx, row in dataFrame.iterrows(): if edgeCondition[inx]: dataFrame.ix[inx, groupName] = 'edge' groupNum += 1 else: dataFrame.ix[inx, groupName] = groupNum return dataFrame[groupName] vGroup(df, df[0] == ' ')

Best way to split DataFrame by edge

More articles: