Calculate daily amounts using python pandas

I am trying to calculate the daily sums of values โ€‹โ€‹using pandas. Here's the test file - http://pastebin.com/uSDfVkTS

This is the code I came up with so far:

import numpy as np import datetime as dt import pandas as pd f = np.genfromtxt('test', dtype=[('datetime', '|S16'), ('data', '<i4')], delimiter=',') dates = [dt.datetime.strptime(i, '%Y-%m-%d %H:%M') for i in f['datetime']] s = pd.Series(f['data'], index = dates) d = s.resample('D', how='sum') 

Using this test file, you will receive:

 2012-01-02 1128 Freq: D 

The first problem is that the calculated amount corresponds to the next day. I was able to solve this using the loffset = '- 1d' parameter.

Now the actual problem is that the data can start not from 00:30 in the afternoon, but at any time of the day. In addition, data has spaces filled with nan values.

However, is it possible to set a lower threshold for the number of values โ€‹โ€‹needed to calculate daily amounts? (for example, if there are less than 40 values โ€‹โ€‹in one day, then NaN should be entered instead of NaN)

I believe that you can define a user-defined function for this and refer to it in the "how" parameter, but I do not know how to encode this function.

+4
source share
1 answer

You can do this directly in Pandas:

 s = pd.read_csv('test', header=None, index_col=0, parse_dates=True) d = s.groupby(lambda x: x.date()).aggregate(lambda x: sum(x) if len(x) >= 40 else np.nan) X.2 2012-01-01 1128 
+12
source

Source: https://habr.com/ru/post/1447244/


All Articles