Iterating a data frame per unit of time instead of line by line

I have pandas.DataFramesomething looking like this:

Time(minutes)    column2       column1
420              1             5
420              2             10
420              3             8
421              1             4
421              2             9
421              3             7

I know how to iterate line by line using iterrows (), but is there an efficient way to iterate over a unit of time in the (Time) column so that I can work with data for that given time at each iteration? Sort of:

time = 420
while(time <= max_time):
   temp <- fetch the sub-dataframe for given time
   process(temp)
   update original df with temp #guaranteed it won't affect any other rows other than the current set of rows
   time += 1
+4
source share
2 answers

You can use .groupby()to repeat in time instead of a string:

The code:

for grp in df.groupby('Time(minutes)'):
    ...

Security Code:

df = pd.read_fwf(StringIO(u"""
    Time(minutes)    column2       column1
    420              1             5
    420              2             10
    420              3             8
    421              1             4
    421              2             9
    421              3             7"""), header=1)

print(df)
for grp in df.groupby('Time(minutes)'):
    print(grp)

Results:

   Time(minutes)  column2  column1
0            420        1        5
1            420        2       10
2            420        3        8
3            421        1        4
4            421        2        9
5            421        3        7

(420,    Time(minutes)  column2  column1
0            420        1        5
1            420        2       10
2            420        3        8)
(421,    Time(minutes)  column2  column1
3            421        1        4
4            421        2        9
5            421        3        7)
+4
source

There are two ways. The first, which will basically save your iteration format, will be to manually subset the DataFrame:

for time in df['time_minutes'].unique():
    temp = df.loc[df['time_minutes'] == time] 
    process(temp)
    # or alternatively, make your changes directly on temp (depending what they are),
    # for example, something like this:
    # df.loc[df['time_minutes'] == time, 'some_column_name'] = assign_something_here

, , , groupby, , Stephen Rauch

+1

Source: https://habr.com/ru/post/1694110/


All Articles