I want to store a data file with different columns in hdf5 file (find the excerpt with data types below).
In [1]: mydf Out [1]: endTime uint32 distance float16 signature category anchorName category stationList object
Before converting some columns (signature and anchorName to my excerpt above), I used the following code to save it (which works very well):
path = 'tmp4.hdf5' key = 'journeys' mydf.to_hdf(path, key, mode='w', complevel=9, complib='bzip2')
But it does not work with category, and then I tried the following:
path = 'tmp4.hdf5' key = 'journeys' mydf.to_hdf(path, key, mode='w', format='t', complevel=9, complib='bzip2')
It works fine if I delete the column columnList, where each record is a list of rows. But with this column, I got the following exception:
Cannot serialize the column [stationList] because its data contents are [mixed] object dtype
How can I improve the code to get a saved data frame?
pandas version: 0.17.1
python version: 2.7.6 (cannot change it due to convenience reasons)
edit1 (some sample code):
import pandas as pd mydf = pd.DataFrame({'endTime' : pd.Series([1443525810,1443540836,1443609470]), 'distance' : pd.Series([454.75,477.25,242.12]), 'signature' : pd.Series(['ab','cd','ab']), 'anchorName' : pd.Series(['tec','ing','pol']), 'stationList' : pd.Series([['t1','t2','t3'],['4','t2','t3'],['t3','t2','t4']]) })
edit2: Meanwhile, I tried different things to get rid of this problem. One of them was to convert the columnList column entries to tuples (perhaps because they should not be changed) and also convert it to a category. But that didn’t change anything. Here are the lines I added after the conversion loop (for completeness only):
mydf.stationList = [tuple(x) for x in mydf.stationList.values] mydf.stationList.astype('category')