Calculate daily amounts using python pandas

Question

Calculate daily amounts using python pandas

I am trying to calculate the daily sums of values using pandas. Here's the test file - http://pastebin.com/uSDfVkTS

This is the code I came up with so far:

import numpy as np import datetime as dt import pandas as pd f = np.genfromtxt('test', dtype=[('datetime', '|S16'), ('data', '<i4')], delimiter=',') dates = [dt.datetime.strptime(i, '%Y-%m-%d %H:%M') for i in f['datetime']] s = pd.Series(f['data'], index = dates) d = s.resample('D', how='sum')

Using this test file, you will receive:

 2012-01-02 1128 Freq: D

The first problem is that the calculated amount corresponds to the next day. I was able to solve this using the loffset = '- 1d' parameter.

Now the actual problem is that the data can start not from 00:30 in the afternoon, but at any time of the day. In addition, data has spaces filled with nan values.

However, is it possible to set a lower threshold for the number of values needed to calculate daily amounts? (for example, if there are less than 40 values in one day, then NaN should be entered instead of NaN)

I believe that you can define a user-defined function for this and refer to it in the "how" parameter, but I do not know how to encode this function.

+4

python pandas

iodinegalaxy Nov 20 '12 at 14:54

source share

1 answer

eumiro · Accepted Answer · 2012-11-20T14:59:23+0000

You can do this directly in Pandas:

 s = pd.read_csv('test', header=None, index_col=0, parse_dates=True) d = s.groupby(lambda x: x.date()).aggregate(lambda x: sum(x) if len(x) >= 40 else np.nan) X.2 2012-01-01 1128

Calculate daily amounts using python pandas

More articles: