Store and retrieve numpy dates in PyTables

I want to store numpy datetime64 data in a PyTables Table . I want to do this without using Pandas.

What I tried so far

Customization

 In [1]: import tables as tb In [2]: import numpy as np In [3]: from datetime import datetime 

create data

 In [4]: data = [(1, datetime(2000, 1, 1, 1, 1, 1)), (2, datetime(2001, 2, 2, 2, 2, 2))] In [5]: rec = np.array(data, dtype=[('a', 'i4'), ('b', 'M8[us]')]) In [6]: rec # a numpy array with my data Out[6]: array([(1, datetime.datetime(2000, 1, 1, 1, 1, 1)), (2, datetime.datetime(2001, 2, 2, 2, 2, 2))], dtype=[('a', '<i4'), ('b', '<M8[us]')]) 

Open PyTables Dataset with Time64Col

 In [7]: f = tb.open_file('foo.h5', 'w') # New PyTables file In [8]: d = f.create_table('/', 'bar', description={'a': tb.Int32Col(pos=0), 'b': tb.Time64Col(pos=1)}) In [9]: d Out[9]: /bar (Table(0,)) '' description := { "a": Int32Col(shape=(), dflt=0, pos=0), "b": Time64Col(shape=(), dflt=0.0, pos=1)} byteorder := 'little' chunkshape := (5461,) 

Add NumPy Data to PyTables Dataset

 In [10]: d.append(rec) In [11]: d Out[11]: /bar (Table(2,)) '' description := { "a": Int32Col(shape=(), dflt=0, pos=0), "b": Time64Col(shape=(), dflt=0.0, pos=1)} byteorder := 'little' chunkshape := (5461,) 

What happened to my dates?

 In [12]: d[:] Out[12]: array([(1, 0.0), (2, 0.0)], dtype=[('a', '<i4'), ('b', '<f8')]) 

I understand that HDF5 does not support native data support. I would expect the additional metadata that PyTables overlay to handle this.

My question

How to save a numpy records array containing dates in PyTables? How can I efficiently retrieve this data from the PyTables table back to the NumPy array and save my data?

General answer

I usually get this answer:

Use pandas

I don’t want to use Pandas because I don’t have an index, I don’t want it to be stored in my dataset, and Pandas does not allow you not to store / store the index (see this question )

+6
source share
1 answer

First, when setting values ​​to Time64Col they must be float64 s. You can do this by calling astype , for example:

 new_rec = rec.astype([('a', 'i4'), ('b', 'f8')]) 

Then you need to convert column b in seconds from the era, which means you need to divide by 1,000,000, since we are in microseconds:

 new_rec['b'] = new_rec['b'] / 1e6 

Then call d.append(new_rec)

When you read the array back into memory, do the opposite and multiply by 1,000,000. You need to make sure everything happens in microseconds before inserting anything that is automatically processed by astype('datetime64[us]') in numpy> = 1.7.x

I used the solution from this question: How to get unix timestamp from numpy.datetime64

Here is the working version of your example:

 In [4]: data = [(1, datetime(2000, 1, 1, 1, 1, 1)), (2, datetime(2001, 2, 2, 2, 2, 2))] In [5]: rec = np.array(data, dtype=[('a', 'i4'), ('b', 'M8[us]')]) In [6]: new_rec = rec.astype([('a', 'i4'), ('b', 'f8')]) In [7]: new_rec Out[7]: array([(1, 946688461000000.0), (2, 981079322000000.0)], dtype=[('a', '<i4'), ('b', '<f8')]) In [8]: new_rec['b'] /= 1e6 In [9]: new_rec Out[9]: array([(1, 946688461.0), (2, 981079322.0)], dtype=[('a', '<i4'), ('b', '<f8')]) In [10]: f = tb.open_file('foo.h5', 'w') # New PyTables file In [11]: d = f.create_table('/', 'bar', description={'a': tb.Int32Col(pos=0), ....: 'b': tb.Time64Col(pos=1)}) In [12]: d.append(new_rec) In [13]: d[:] Out[13]: array([(1, 946688461.0), (2, 981079322.0)], dtype=[('a', '<i4'), ('b', '<f8')]) In [14]: r = d[:] In [15]: r['b'] *= 1e6 In [16]: r.astype([('a', 'i4'), ('b', 'datetime64[us]')]) Out[16]: array([(1, datetime.datetime(2000, 1, 1, 1, 1, 1)), (2, datetime.datetime(2001, 2, 2, 2, 2, 2))], dtype=[('a', '<i4'), ('b', '<M8[us]')]) 
+3
source

Source: https://habr.com/ru/post/974929/


All Articles