Pandas - retrieve data using the specified start date, end date, and granularity

Question

Pandas - retrieve data using the specified start date, end date, and granularity

I want to reprogram an indexed data frame with a date using the start date, end date and "drill down"

Let's say I have this data framework:

                   value
00:00, 01/05/2017    2
12:00, 01/05/2017    4
00:00, 02/05/2017    6
12:00, 02/05/2017    8
00:00, 03/05/2017   10
12:00, 03/05/2017   12

And I want to redo it in order to go from 06:00, 01/05/2017to
18:00 02/05/2017with a “graininess” of 12 hours (this is the same as the original here for simplicity, but not necessarily). As a result, I want:

                   value
06:00, 01/05/2017    3
18:00, 01/05/2017    5
06:00, 02/05/2017    7
18:00, 02/05/2017    9

Note that the values are the average values that they overlap (e.g. 3 = average (2.4))

I am not sure how to do this.

My first attempt:

def resample(df: DataFrame, start: datetime, end: datetime, granularity: timedelta) -> DataFrame:
    result = df.resample(granularity).mean()
    result = result[result.index <= end]
    result = result[result.index >= start]
    return result

This correctly digitizes the data frame and provides the correct granularity, but does not align the results with the start date, so the result:

                   value
12:00, 01/05/2017    4
00:00, 02/05/2017    6
12:00, 02/05/2017    8

base :

def resample(df: DataFrame, start: datetime, end: datetime, desired_granularity: timedelta) -> DataFrame:
    data_before_start = df[df.index <= start]
    # Get the last index value before our start date
    last_date_before_start = data_before_start.last_valid_index()
    current_granularity_secs = seconds_between_measurements(df)
    rule = str(int(desired_granularity.total_seconds())) + 'S'
    base = current_granularity_secs - (start - last_date_before_start).total_seconds()
    result = df.resample(rule, base=base).mean()
    result = result[result.index < end]
    result = result[result.index >= start]
    return result

:

                   value
06:00, 01/05/2017    4
18:00, 01/05/2017    6
06:00, 02/05/2017    8
18:00, 02/05/2017    10

, , .

- , , ?

, - :)

EDIT: - , , , pad(). "" , ()

+4

python pandas

duck 05 '17 13:38

1

Mathia Haure-Touzé · Answer 1 · 2017-10-20T14:33:08+0000

end_start end_date . .resample :

df.start_date
df.end_date

:

, start_date < end_date
start_date end_date:

:

df[["start_date","end_date"]] = df[["start_date","end_date"]].astype(np.datetime64)
df1 = df.set_index("start_date").resample(freq).pad().reset_index()
df2 = df.set_index("end_date").resample(freq).bfill().reset_index()
df3 = pd.concat([df1, df2], ignore_index=True)

def function(x, df1):
    if x.name < df1.shape[0]:
        x.end_date = x.start_date + pd.Timedelta(freq)
    else:
        x.start_date = x.end_date - pd.Timedelta(freq)
    return x

df3[ df3.start_date < df3.end_date ].apply(lambda x: function(x, df1), axis=1)

Pandas ,
df.resample(freq, on='start_date')

Pandas - retrieve data using the specified start date, end date, and granularity

More articles: