Python Iterate through dataframe columns

Question

Python Iterate through dataframe columns

While working on the problem, I have the following data frame in python

    week    hour    week_hr     store_code  baskets
0   201616  106     201616106   505         0
1   201616  107     201616107   505         0
2   201616  108     201616108   505         0
3   201616  109     201616109   505         18
4   201616  110     201616110   505         0
5   201616  106     201616108   910         0
6   201616  107     201616106   910         0
7   201616  108     201616107   910         2
8   201616  109     201616108   910         3
9   201616  110     201616109   910         10

Here, the “hour” variable is CONCAT of “weekday” and “store hour”, for example, Monday of the week = 1 and store hour is 6am, then hour variables = 106, similarly cal_hr is CONCAT of the week and hour. I want to get these lines where I see a trend without baskets, i.e. 0 rolling baskets for 3 weeks . in the above case, I only get the first 3 rows . that is, for the store 505 there is a continuous cycle of 1 basket from 106 to 108 . But I do not want the line (4,5,6) , because even if there are 0 baskets for 3 hours continuous, but the clock is not really continuous. 110 → 106 → 107 . To keep the watch continuousthey must lie in the range 106-110. . Essentially, I want all stores and their respective rows to have 0 baskets for 3 hours uninterrupted on any given day. Dummy output

    week    hour    week_hr     store_code  baskets
0   201616  106     201616106   505         0
1   201616  107     201616107   505         0
2   201616  108     201616108   505         0

Can I do this in python using pandas and loops? A dataset requires sorting by store and hour. Completely new for python (

+4

python loops python-3.x pandas dataframe

Mukul Jul 22 '16 at 18:17

source share

2 answers

:

store_code, week_hr
0
by store_code

:

t1 = df.sort_values(['store_code', 'week_hr'])

t2 = t1[t1['baskets'] == 0]

grouped = t2.groupby('store_code')['week_hr'].apply(lambda x: x.tolist())    

for store_code, week_hrs in grouped.iteritems():
    print(store_code, week_hrs)
    # do something

0

Cuong Tran 22 . '16 19:14

caiohamamura · Accepted Answer · 2016-07-22T20:14:26+0000

Follow these steps:

Sort by store_code, week_hr
Filter by 0
Keep subtraction between df ['week_hr'] [1:]. values-df ['week_hr'] [: - 1] .values so you know if they are continuous.

Now you can give groups continuous and filter as you wish.

import numpy as np
import pandas as pd

# 1
t1 = df.sort_values(['store_code', 'week_hr'])

# 2
t2 = t1[t1['baskets'] == 0]

# 3
continuous = t2['week_hr'][1:].values-t2['week_hr'][:-1].values == 1
groups = np.cumsum(np.hstack([False, continuous==False]))
t2['groups'] = groups

# 4
t3 = t2.groupby(['store_code', 'groups'], as_index=False)['week_hr'].count()
t4 = t3[t3.week_hr > 2]
print pd.merge(t2, t4[['store_code', 'groups']])

No need for cycling!

Python Iterate through dataframe columns

More articles: