I am trying to calculate the daily sums of values โโusing pandas. Here's the test file - http://pastebin.com/uSDfVkTS
This is the code I came up with so far:
import numpy as np import datetime as dt import pandas as pd f = np.genfromtxt('test', dtype=[('datetime', '|S16'), ('data', '<i4')], delimiter=',') dates = [dt.datetime.strptime(i, '%Y-%m-%d %H:%M') for i in f['datetime']] s = pd.Series(f['data'], index = dates) d = s.resample('D', how='sum')
Using this test file, you will receive:
2012-01-02 1128 Freq: D
The first problem is that the calculated amount corresponds to the next day. I was able to solve this using the loffset = '- 1d' parameter.
Now the actual problem is that the data can start not from 00:30 in the afternoon, but at any time of the day. In addition, data has spaces filled with nan values.
However, is it possible to set a lower threshold for the number of values โโneeded to calculate daily amounts? (for example, if there are less than 40 values โโin one day, then NaN should be entered instead of NaN)
I believe that you can define a user-defined function for this and refer to it in the "how" parameter, but I do not know how to encode this function.
source share