Python Iterate through dataframe columns

While working on the problem, I have the following data frame in python

    week    hour    week_hr     store_code  baskets
0   201616  106     201616106   505         0
1   201616  107     201616107   505         0
2   201616  108     201616108   505         0
3   201616  109     201616109   505         18
4   201616  110     201616110   505         0
5   201616  106     201616108   910         0
6   201616  107     201616106   910         0
7   201616  108     201616107   910         2
8   201616  109     201616108   910         3
9   201616  110     201616109   910         10

Here, the “hour” variable is CONCAT of “weekday” and “store hour”, for example, Monday of the week = 1 and store hour is 6am, then hour variables = 106, similarly cal_hr is CONCAT of the week and hour. I want to get these lines where I see a trend without baskets, i.e. 0 rolling baskets for 3 weeks . in the above case, I only get the first 3 rows . that is, for the store 505 there is a continuous cycle of 1 basket from 106 to 108 . But I do not want the line (4,5,6) , because even if there are 0 baskets for 3 hours continuous, but the clock is not really continuous. 110 → 106 → 107 . To keep the watch continuousthey must lie in the range 106-110. . Essentially, I want all stores and their respective rows to have 0 baskets for 3 hours uninterrupted on any given day. Dummy output

    week    hour    week_hr     store_code  baskets
0   201616  106     201616106   505         0
1   201616  107     201616107   505         0
2   201616  108     201616108   505         0

Can I do this in python using pandas and loops? A dataset requires sorting by store and hour. Completely new for python (

+4
source share
2 answers

Follow these steps:

  • Sort by store_code, week_hr
  • Filter by 0
  • Keep subtraction between df ['week_hr'] [1:]. values-df ['week_hr'] [: - 1] .values ​​so you know if they are continuous.
  • Now you can give groups continuous and filter as you wish.

    import numpy as np
    import pandas as pd
    
    # 1
    t1 = df.sort_values(['store_code', 'week_hr'])
    
    # 2
    t2 = t1[t1['baskets'] == 0]
    
    # 3
    continuous = t2['week_hr'][1:].values-t2['week_hr'][:-1].values == 1
    groups = np.cumsum(np.hstack([False, continuous==False]))
    t2['groups'] = groups
    
    # 4
    t3 = t2.groupby(['store_code', 'groups'], as_index=False)['week_hr'].count()
    t4 = t3[t3.week_hr > 2]
    print pd.merge(t2, t4[['store_code', 'groups']])
    

No need for cycling!

+1

:

  • store_code, week_hr
  • 0
  • by store_code

:

t1 = df.sort_values(['store_code', 'week_hr'])

t2 = t1[t1['baskets'] == 0]

grouped = t2.groupby('store_code')['week_hr'].apply(lambda x: x.tolist())    

for store_code, week_hrs in grouped.iteritems():
    print(store_code, week_hrs)
    # do something
0

Source: https://habr.com/ru/post/1648825/


All Articles