Calculate the necessary equipment when changing in time intervals

Question

Calculate the necessary equipment when changing in time intervals

I would like to visualize the number of required machines in the workshop at a certain time on the graph, and the x-axis - the continuous time axis, and the y-axis - the number of shifts.

In the data table below you will find an example of my data. Here you see the Shift_ID (which are unique) and the start and end times of this shift. During the day, I would like to see how many cars are required in a certain period of time. It can be 5 minutes, a quarter of an hour, half an hour and an hour.

 df: Shift_ID Shift_Time_Start Shift_Time_End 0 1 2016-03-22 9:00:00 2016-03-22 9:35:00 1 2 2016-03-22 9:20:00 2016-03-22 10:20:00 2 3 2016-03-22 9:40:00 2016-03-22 10:14:00 3 4 2016-03-22 10:00:00 2016-03-22 10:31:00

In this example, in the quarter 9: 30-9: 45 I would need 3 cars to be able to do each shift at this particular time. The desired result will look something like this:

 df2: Interval Count 0 2016-03-22 9:00:00 - 2016-03-22 9:15:00 1 1 2016-03-22 9:15:00 - 2016-03-22 9:30:00 2 2 2016-03-22 9:30:00 - 2016-03-22 9:45:00 3 3 2016-03-22 9:45:00 - 2016-03-22 10:00:00 2 4 2016-03-22 10:00:00 - 2016-03-22 10:15:00 2 5 2016-03-22 10:15:00 - 2016-03-22 10:30:00 2 6 2016-03-22 10:30:00 - 2016-03-22 10:45:00 1

With this data frame, I could round it to the very bottom of the interval, and then build it in a graph.

I went in cycles on how to "see" whether the shift is within several intervals. Do you have any ideas how to handle this?

NB: all date and time values, of course, are of type datetime

EDIT after solving MaxU and knightofni

I used MaxU code to build both of your codes. It seems that they are doing well with 15min, but please take a look at the results with 5 minutes:

MaxU:

knightofni:

EDIT 2 April 4, 2015

+5

python numpy pandas

Uis234 Mar 31 '16 at 15:41

source share

2 answers

you can do it like this:

code:

 import io import pandas as pd import matplotlib import matplotlib.pyplot as plt # load sample data into DF (data frame) data="""\ idx;Shift_ID;Shift_Time_Start;Shift_Time_End 0;1;2016-03-22 09:00:00;2016-03-22 09:35:00 1;2;2016-03-22 09:20:00;2016-03-22 10:20:00 2;3;2016-03-22 09:40:00;2016-03-22 10:14:00 3;4;2016-03-22 10:00:00;2016-03-22 10:31:00 4;5;2016-03-22 08:11:00;2016-03-22 09:35:00 4;6;2016-03-23 14:11:00;2016-03-23 14:23:00 """ df = pd.read_csv(io.StringIO(data), sep=';', index_col=0, parse_dates=['Shift_Time_Start','Shift_Time_End']) # time interval freq = '10min' # prepare resulting DF with desired intervals a = pd.DataFrame({ 'begin': pd.date_range(df.Shift_Time_Start.min(), df.Shift_Time_End.max(), freq=freq) }) # resample a = a.set_index('begin').resample(rule='5min').first().reset_index() a['end'] = a.begin + pd.Timedelta(freq) # count number of unique Shift_ID in `DF` for each interval in `a` def f(x): return df[( (x.begin >= df.Shift_Time_Start) \ & \ (x.begin <= df.Shift_Time_End) ) \ | \ ( (x.end >= df.Shift_Time_Start) \ & \ (x.end <= df.Shift_Time_End) ) \ ].Shift_ID.nunique() a['count'] = a.apply(f, axis=1) # remove rows without any shifts a = a[a['count'] > 0].reset_index(drop=True) a['interval'] = a.begin.dt.strftime('%d.%m %H:%M').astype(str) + \ ' - ' + \ a.end.dt.strftime('%d.%m %H:%M').astype(str) a = a.set_index('interval')[['count']] print(a) matplotlib.style.use('ggplot') a.plot(kind='bar', alpha=0.75) fig = plt.gcf() fig.subplots_adjust(bottom=0.2) plt.show()

Source Data Set:

 In [135]: df Out[135]: Shift_ID Shift_Time_Start Shift_Time_End idx 0 1 2016-03-22 09:00:00 2016-03-22 09:35:00 1 2 2016-03-22 09:20:00 2016-03-22 10:20:00 2 3 2016-03-22 09:40:00 2016-03-22 10:14:00 3 4 2016-03-22 10:00:00 2016-03-22 10:31:00 4 5 2016-03-22 08:11:00 2016-03-22 09:35:00 4 6 2016-03-23 14:11:00 2016-03-23 14:23:00 In [136]: a Out[136]: count interval 22.03 08:10 - 22.03 08:20 1 22.03 08:15 - 22.03 08:25 1 22.03 08:20 - 22.03 08:30 1 22.03 08:25 - 22.03 08:35 1 22.03 08:30 - 22.03 08:40 1 22.03 08:35 - 22.03 08:45 1 22.03 08:40 - 22.03 08:50 1 22.03 08:45 - 22.03 08:55 1 22.03 08:50 - 22.03 09:00 2 22.03 08:55 - 22.03 09:05 2 22.03 09:00 - 22.03 09:10 2 22.03 09:05 - 22.03 09:15 2 22.03 09:10 - 22.03 09:20 3 22.03 09:15 - 22.03 09:25 3 22.03 09:20 - 22.03 09:30 3 22.03 09:25 - 22.03 09:35 3 22.03 09:30 - 22.03 09:40 4 22.03 09:35 - 22.03 09:45 4 22.03 09:40 - 22.03 09:50 2 22.03 09:45 - 22.03 09:55 2 22.03 09:50 - 22.03 10:00 3 22.03 09:55 - 22.03 10:05 3 22.03 10:00 - 22.03 10:10 3 22.03 10:05 - 22.03 10:15 3 22.03 10:10 - 22.03 10:20 3 22.03 10:15 - 22.03 10:25 2 22.03 10:20 - 22.03 10:30 2 22.03 10:25 - 22.03 10:35 1 22.03 10:30 - 22.03 10:40 1 23.03 14:05 - 23.03 14:15 1 23.03 14:10 - 23.03 14:20 1 23.03 14:15 - 23.03 14:25 1 23.03 14:20 - 23.03 14:30 1

+1

Maxu Mar 31 '16 at 19:55

source share

knightofni · Accepted Answer · 2016-04-01T07:20:44+0000

It's not so easy. I cannot imagine a way to do this in a fully vectorized way, but there are two approaches that will work.

1- Reorganize your data so that you have only one datetime column. The goal is to have one row for each shift_ID in the minimum interval. Then you can go to the timegrouper group.

Working example:

Recreating your DataFrame

 import pandas as pd import arrow data = { 'Shift_ID' : [1,2,3,4], 'Shift_Time_Start' : [arrow.get('2016-03-22 09:00:00').datetime, arrow.get('2016-03-22 09:20:00').datetime, arrow.get('2016-03-22 09:40:00').datetime, arrow.get('2016-03-22 10:00:00').datetime ], 'Shift_Time_End' : [arrow.get('2016-03-22 09:35:00').datetime, arrow.get('2016-03-22 10:20:00').datetime, arrow.get('2016-03-22 10:14:00').datetime, arrow.get('2016-03-22 10:31:00').datetime ], } df = pd.DataFrame(data) min_int = '5T' df Shift_ID Shift_Time_End Shift_Time_Start 0 1 2016-03-22 09:35:00+00:00 2016-03-22 09:00:00+00:00 1 2 2016-03-22 10:20:00+00:00 2016-03-22 09:20:00+00:00 2 3 2016-03-22 10:14:00+00:00 2016-03-22 09:40:00+00:00 3 4 2016-03-22 10:31:00+00:00 2016-03-22 10:00:00+00:00

Creating a New Df

 new_data = {'time' : [], 'Shift_ID': []} # dict to hold the data for row in df.iterrows(): # creates a list of all dates of this shift, from start to end dates = pd.date_range(row[1].Shift_Time_Start, row[1].Shift_Time_End, freq=min_int) for date in dates: new_data['time'].append(date) new_data['Shift_ID'].append(row[1].Shift_ID) # creating the new df newdf = pd.DataFrame(new_data).set_index('time') newdf.head() Shift_ID time 2016-03-22 09:00:00+00:00 1 2016-03-22 09:05:00+00:00 1 2016-03-22 09:10:00+00:00 1 2016-03-22 09:15:00+00:00 1 2016-03-22 09:20:00+00:00 1

Group Timegrouper

 # We groupby the time column, resampling every min_int # (in our case 5 minutes, represented by '5T'), # then we check how many uniquer shift_id. newdf.groupby(pd.TimeGrouper(freq=min_int)).agg({'Shift_ID': lambda x : len(set(x))}) Shift_ID time 2016-03-22 09:00:00+00:00 1 2016-03-22 09:05:00+00:00 1 2016-03-22 09:10:00+00:00 1 2016-03-22 09:15:00+00:00 1 2016-03-22 09:20:00+00:00 2 2016-03-22 09:25:00+00:00 2 2016-03-22 09:30:00+00:00 2 2016-03-22 09:35:00+00:00 2 2016-03-22 09:40:00+00:00 2

As they say at 9:15, there was one shift, while at 9:20 there were 2

This is not exactly what you need, but I would say that it is much easier. If you want to combine your desired result, it should be pretty simple (just use .shift to create a copy of the date column moved by one).

** Change

Link to notepad with code

Calculate the necessary equipment when changing in time intervals

Recreating your DataFrame

Creating a New Df

Group Timegrouper

More articles: