Round pandas datetime index?

I read several timers spreadsheets in w460> and combine them with a common datetime pandas index. The data logger that recorded the time site is not 100% more accurate, which makes oversampling very annoying, because depending on whether the time is slightly higher or lower than the selected interval, it will create NaN and start making my series look like a broken line. Here is my code

def loaddata(filepaths): t1 = time.clock() for i in range(len(filepaths)): xl = pd.ExcelFile(filepaths[i]) df = xl.parse(xl.sheet_names[0], header=0, index_col=2, skiprows=[0,2,3,4], parse_dates=True) df = df.dropna(axis=1, how='all') df = df.drop(['Decimal Year Day', 'Decimal Year Day.1', 'RECORD'], axis=1) if i == 0: dfs = df else: dfs = concat([dfs, df], axis=1) t2 = time.clock() print "Files loaded into dataframe in %s seconds" %(t2-t1) files = ["London Lysimeters corrected 5min.xlsx", "London Water Balance 5min.xlsx"] data = loaddata(files) 

Here's the idea of ​​the index:

data.index

class 'pandas.tseries.index.DatetimeIndex'> [2012-08-27 12: 05: 00.000002, ..., 2013-07-12 15: 10: 00.000004] Length: 91910, Frequency: None, Time Zone: None

What would be the fastest and most common for rounding the index to the nearest minute?

+4
source share
3 answers

Here is a little trick. Time in nanoseconds (if you look like np.int64 ). So, round to minutes in nanoseconds.

 In [75]: index = pd.DatetimeIndex([ Timestamp('20120827 12:05:00.002'), Timestamp('20130101 12:05:01'), Timestamp('20130712 15:10:00'), Timestamp('20130712 15:10:00.000004') ]) In [79]: index.values Out[79]: array(['2012-08-27T08:05:00.002000000-0400', '2013-01-01T07:05:01.000000000-0500', '2013-07-12T11:10:00.000000000-0400', '2013-07-12T11:10:00.000004000-0400'], dtype='datetime64[ns]') In [78]: pd.DatetimeIndex(((index.asi8/(1e9*60)).round()*1e9*60).astype(np.int64)).values Out[78]: array(['2012-08-27T08:05:00.000000000-0400', '2013-01-01T07:05:00.000000000-0500', '2013-07-12T11:10:00.000000000-0400', '2013-07-12T11:10:00.000000000-0400'], dtype='datetime64[ns]') 
+6
source

Issue 4314 mentioned by Jeff is now closed, and for DatetimeIndex, Timestamp, TimedeltaIndex and Timedelta, the round() method was added to pandas 0.18.0. Now we can do the following:

 In[109]: index = pd.DatetimeIndex([pd.Timestamp('20120827 12:05:00.002'), pd.Timestamp('20130101 12:05:01'), pd.Timestamp('20130712 15:10:30'), pd.Timestamp('20130712 15:10:31')]) In[110]: index.values Out[110]: array(['2012-08-27T12:05:00.002000000', '2013-01-01T12:05:01.000000000', '2013-07-12T15:10:30.000000000', '2013-07-12T15:10:31.000000000'], dtype='datetime64[ns]') In[111]: index.round('min') Out[111]: DatetimeIndex(['2012-08-27 12:05:00', '2013-01-01 12:05:00', '2013-07-12 15:10:00', '2013-07-12 15:11:00'], dtype='datetime64[ns]', freq=None) 

round() takes a frequency parameter. String aliases for it are listed here .

+4
source

For data columns; Usage: round_hour (df.Start_time)

 def round_hour(x,tt=''): if tt=='M': return pd.to_datetime(((x.astype('i8')/(1e9*60)).round()*1e9*60).astype(np.int64)) elif tt=='H': return pd.to_datetime(((x.astype('i8')/(1e9*60*60)).round()*1e9*60*60).astype(np.int64)) else: return pd.to_datetime(((x.astype('i8')/(1e9)).round()*1e9).astype(np.int64)) 
0
source

Source: https://habr.com/ru/post/1492641/


All Articles