Pandas Reindex to fill in missing dates or a better method to fill in?

Question

Pandas Reindex to fill in missing dates or a better method to fill in?

My data is the absence records from the factory. Some days there are no passes, so there is no data or date on this day. However, and where it gets hairy with the other examples shown, on any day there may be several absences for various reasons. Data does not always have a 1 to 1 date to record ratio.

As a result, I hope this is something like this:

(index)    Shift        Description     Instances (SUM)
01-01-14   2nd Baker    Discipline      0
01-01-14   2nd Baker    Vacation        0
01-01-14   1st Cooks    Discipline      0
01-01-14   1st Cooks    Vacation        0
01-02-14   2nd Baker    Discipline      4
01-02-14   2nd Baker    Vacation        3
01-02-14   1st Cooks    Discipline      3
01-02-14   1st Cooks    Vacation        3

Etc. The idea is that all shifts and descriptions will have values for all days for the period (in this example, 1/1/2014 - 12/31/2014)

I read a few examples and I came closest to this work here .

ts = pd.read_csv('Absentee_Data_2.csv'
                , encoding = 'utf-8'
                ,parse_dates=[3]
                ,index_col=3
                ,dayfirst=True
                )

idx =  pd.date_range('01.01.2009', '12.31.2017')

ts.index = pd.DatetimeIndex(ts.index)
# ts = ts.reindex(idx, fill_value='NaN')
df = pd.DataFrame(index = idx)
df1 = df.join(ts, how='left')

, ts = ts.reindex(idx, fill_value='NaN'), . 10 , , 100%, , , , .

:

Description Unexcused   Instances   Date        Shift
Discipline  FALSE              1    Jan 2 2014  2nd Baker
Vacation    TRUE               2    Jan 2 2014  1st Cooks
Discipline  FALSE              3    Jan 2 2014  2nd Baker
Vacation    TRUE               1    Jan 2 2014  1st Cooks
Discipline  FALSE              2    Apr 8 2014  2nd Baker
Vacation    TRUE               3    Apr 8 2014  1st Cooks
Discipline  FALSE              1    Jun 1 2014  2nd Baker
Vacation    TRUE               2    Jun 1 2014  1st Cooks
Discipline  FALSE              3    Jun 1 2014  2nd Baker
Vacation    TRUE               1    Jun 1 2014  1st Cooks
Vacation    TRUE               2    Jul 5 2014  1st Cooks
Discipline  FALSE              3    Jul 5 2014  2nd Baker
Vacation    TRUE               2    Dec 3 2014  1st Cooks

, 2 . , , , , , . , , .

+4

python python-3.x pandas pandas-groupby

SDS 04 . '17 11:47

2

, (, , , , ). :

import pandas as pd

ts = pd.read_csv('Absentee_Data_2.csv', encoding = 'utf-8',parse_dates=[3],index_col=3,dayfirst=True, sep=",")

idx =  pd.date_range('01.01.2009', '12.31.2017')

ts.index = pd.DatetimeIndex(ts.index)
#ts = ts.reindex(idx, fill_value='NaN')
df = pd.DataFrame(index = idx)
df1 = df.join(ts, how='left')
df2 = df1.copy()
df3 = df1.copy()
df4 = df1.copy()
dict1 = {'Description': 'Discipline', 'Instances': 0, 'Shift': '1st Cooks'}
df1 = df1.fillna(dict1)
dict1["Description"] = "Vacation"
df2 = df2.fillna(dict1)
dict1["Shift"] = "2nd Baker"
df3 = df3.fillna(dict1)
dict1["Description"] = "Discipline"
df4 = df4.fillna(dict1)
df_with_duplicates = pd.concat([df1,df2,df3,df4])
final_res = df_with_duplicates.reset_index().drop_duplicates(subset=["index"] + list(dict1.keys())).set_index("index").drop("Unexcused", axis=1)

, :

4 df, ts (df1)
fillna(dict1) NaN
4 dfs, , csv 4
, , , , reset_index, `set_index ( "index" )
, Unexcused

, :

In [5]: final_res["2013-01-2"]
Out[5]: 
           Description  Instances      Shift
index                                       
2013-01-02  Discipline        0.0  1st Cooks
2013-01-02    Vacation        0.0  1st Cooks
2013-01-02    Vacation        0.0  2nd Baker
2013-01-02  Discipline        0.0  2nd Baker

In [6]: final_res["2014-01-2"]
Out[6]: 
           Description  Instances       Shift
index                                        
2014-01-02  Discipline        1.0   2nd Baker
2014-01-02    Vacation        2.0   1st Cooks
2014-01-02  Discipline        3.0   2nd Baker
2014-01-02    Vacation        1.0   1st Cooks
1

+1

Adonis 04 . '17 18:17

DJK · Accepted Answer · 2017-08-04T15:36:39+0000

, datetime,

ts.set_index(['Date'],inplace=True)
ts.index = pd.to_datetime(ts.index,format='%b %d %Y')
d2 = pd.DataFrame(index=pd.date_range('2014-01-01','2014-12-31'))

print ts.join(d2,how='right')

Pandas Reindex to fill in missing dates or a better method to fill in?

More articles: