Store and retrieve numpy dates in PyTables

Question

Store and retrieve numpy dates in PyTables

I want to store numpy datetime64 data in a PyTables Table . I want to do this without using Pandas.

What I tried so far

Customization

 In [1]: import tables as tb In [2]: import numpy as np In [3]: from datetime import datetime

create data

 In [4]: data = [(1, datetime(2000, 1, 1, 1, 1, 1)), (2, datetime(2001, 2, 2, 2, 2, 2))] In [5]: rec = np.array(data, dtype=[('a', 'i4'), ('b', 'M8[us]')]) In [6]: rec # a numpy array with my data Out[6]: array([(1, datetime.datetime(2000, 1, 1, 1, 1, 1)), (2, datetime.datetime(2001, 2, 2, 2, 2, 2))], dtype=[('a', '<i4'), ('b', '<M8[us]')])

Open PyTables Dataset with `Time64Col`

 In [7]: f = tb.open_file('foo.h5', 'w') # New PyTables file In [8]: d = f.create_table('/', 'bar', description={'a': tb.Int32Col(pos=0), 'b': tb.Time64Col(pos=1)}) In [9]: d Out[9]: /bar (Table(0,)) '' description := { "a": Int32Col(shape=(), dflt=0, pos=0), "b": Time64Col(shape=(), dflt=0.0, pos=1)} byteorder := 'little' chunkshape := (5461,)

Add NumPy Data to PyTables Dataset

 In [10]: d.append(rec) In [11]: d Out[11]: /bar (Table(2,)) '' description := { "a": Int32Col(shape=(), dflt=0, pos=0), "b": Time64Col(shape=(), dflt=0.0, pos=1)} byteorder := 'little' chunkshape := (5461,)

What happened to my dates?

 In [12]: d[:] Out[12]: array([(1, 0.0), (2, 0.0)], dtype=[('a', '<i4'), ('b', '<f8')])

I understand that HDF5 does not support native data support. I would expect the additional metadata that PyTables overlay to handle this.

My question

How to save a numpy records array containing dates in PyTables? How can I efficiently retrieve this data from the PyTables table back to the NumPy array and save my data?

General answer

I usually get this answer:

Use pandas

I don’t want to use Pandas because I don’t have an index, I don’t want it to be stored in my dataset, and Pandas does not allow you not to store / store the index (see this question )

+6

python numpy datetime pytables

Mocklin Sep 7 '14 at 1:56

source share

1 answer

Phillip cloud · Answer 1 · 2014-09-07T02:59:38+0000

First, when setting values to Time64Col they must be float64 s. You can do this by calling astype , for example:

 new_rec = rec.astype([('a', 'i4'), ('b', 'f8')])

Then you need to convert column b in seconds from the era, which means you need to divide by 1,000,000, since we are in microseconds:

 new_rec['b'] = new_rec['b'] / 1e6

Then call d.append(new_rec)

When you read the array back into memory, do the opposite and multiply by 1,000,000. You need to make sure everything happens in microseconds before inserting anything that is automatically processed by astype('datetime64[us]') in numpy> = 1.7.x

I used the solution from this question: How to get unix timestamp from numpy.datetime64

Here is the working version of your example:

 In [4]: data = [(1, datetime(2000, 1, 1, 1, 1, 1)), (2, datetime(2001, 2, 2, 2, 2, 2))] In [5]: rec = np.array(data, dtype=[('a', 'i4'), ('b', 'M8[us]')]) In [6]: new_rec = rec.astype([('a', 'i4'), ('b', 'f8')]) In [7]: new_rec Out[7]: array([(1, 946688461000000.0), (2, 981079322000000.0)], dtype=[('a', '<i4'), ('b', '<f8')]) In [8]: new_rec['b'] /= 1e6 In [9]: new_rec Out[9]: array([(1, 946688461.0), (2, 981079322.0)], dtype=[('a', '<i4'), ('b', '<f8')]) In [10]: f = tb.open_file('foo.h5', 'w') # New PyTables file In [11]: d = f.create_table('/', 'bar', description={'a': tb.Int32Col(pos=0), ....: 'b': tb.Time64Col(pos=1)}) In [12]: d.append(new_rec) In [13]: d[:] Out[13]: array([(1, 946688461.0), (2, 981079322.0)], dtype=[('a', '<i4'), ('b', '<f8')]) In [14]: r = d[:] In [15]: r['b'] *= 1e6 In [16]: r.astype([('a', 'i4'), ('b', 'datetime64[us]')]) Out[16]: array([(1, datetime.datetime(2000, 1, 1, 1, 1, 1)), (2, datetime.datetime(2001, 2, 2, 2, 2, 2))], dtype=[('a', '<i4'), ('b', '<M8[us]')])

Store and retrieve numpy dates in PyTables

What I tried so far

Customization

create data

Open PyTables Dataset with Time64Col

Add NumPy Data to PyTables Dataset

What happened to my dates?

My question

General answer

More articles:

Open PyTables Dataset with `Time64Col`