My data is the absence records from the factory. Some days there are no passes, so there is no data or date on this day. However, and where it gets hairy with the other examples shown, on any day there may be several absences for various reasons. Data does not always have a 1 to 1 date to record ratio.
As a result, I hope this is something like this:
(index) Shift Description Instances (SUM)
01-01-14 2nd Baker Discipline 0
01-01-14 2nd Baker Vacation 0
01-01-14 1st Cooks Discipline 0
01-01-14 1st Cooks Vacation 0
01-02-14 2nd Baker Discipline 4
01-02-14 2nd Baker Vacation 3
01-02-14 1st Cooks Discipline 3
01-02-14 1st Cooks Vacation 3
Etc. The idea is that all shifts and descriptions will have values for all days for the period (in this example, 1/1/2014 - 12/31/2014)
I read a few examples and I came closest to this work here .
ts = pd.read_csv('Absentee_Data_2.csv'
, encoding = 'utf-8'
,parse_dates=[3]
,index_col=3
,dayfirst=True
)
idx = pd.date_range('01.01.2009', '12.31.2017')
ts.index = pd.DatetimeIndex(ts.index)
df = pd.DataFrame(index = idx)
df1 = df.join(ts, how='left')
, ts = ts.reindex(idx, fill_value='NaN'), . 10 , , 100%, , , , .
:
Description Unexcused Instances Date Shift
Discipline FALSE 1 Jan 2 2014 2nd Baker
Vacation TRUE 2 Jan 2 2014 1st Cooks
Discipline FALSE 3 Jan 2 2014 2nd Baker
Vacation TRUE 1 Jan 2 2014 1st Cooks
Discipline FALSE 2 Apr 8 2014 2nd Baker
Vacation TRUE 3 Apr 8 2014 1st Cooks
Discipline FALSE 1 Jun 1 2014 2nd Baker
Vacation TRUE 2 Jun 1 2014 1st Cooks
Discipline FALSE 3 Jun 1 2014 2nd Baker
Vacation TRUE 1 Jun 1 2014 1st Cooks
Vacation TRUE 2 Jul 5 2014 1st Cooks
Discipline FALSE 3 Jul 5 2014 2nd Baker
Vacation TRUE 2 Dec 3 2014 1st Cooks
, 2 . , , , , , . , , .