You should just give it a try.
In [22]: df = DataFrame(np.random.randn(5,2),columns=['A','B']) In [23]: store = pd.HDFStore('test.h5',mode='w') In [24]: store.append('df_only_indexables',df) In [25]: store.append('df_with_data_columns',df,data_columns=True) In [26]: store.append('df_no_index',df,data_columns=True,index=False) In [27]: store Out[27]: <class 'pandas.io.pytables.HDFStore'> File path: test.h5 /df_no_index frame_table (typ->appendable,nrows->5,ncols->2,indexers->[index],dc->[A,B]) /df_only_indexables frame_table (typ->appendable,nrows->5,ncols->2,indexers->[index]) /df_with_data_columns frame_table (typ->appendable,nrows->5,ncols->2,indexers->[index],dc->[A,B]) In [28]: store.close()
you automatically get the index of the saved frame as the query column. By default, no other columns can be requested.
If you specify data_columns=True or data_columns=list_of_columns , they will be stored separately and then can then be requested.
If you specify index=False , then the PyTables index PyTables not be created automatically for the query column (for example, index and / or data_columns ).
To see the actual indexes being created ( PyTables indexes), see the output below. colindexes determines which columns have the actual PyTables index. (I cut it a few).
/df_no_index/table (Table(5,)) '' description := { "index": Int64Col(shape=(), dflt=0, pos=0), "A": Float64Col(shape=(), dflt=0.0, pos=1), "B": Float64Col(shape=(), dflt=0.0, pos=2)} byteorder := 'little' chunkshape := (2730,) /df_no_index/table._v_attrs (AttributeSet), 15 attributes: [A_dtype := 'float64', A_kind := ['A'], B_dtype := 'float64', B_kind := ['B'], CLASS := 'TABLE', FIELD_0_FILL := 0, FIELD_0_NAME := 'index', FIELD_1_FILL := 0.0, FIELD_1_NAME := 'A', FIELD_2_FILL := 0.0, FIELD_2_NAME := 'B', NROWS := 5, TITLE := '', VERSION := '2.7', index_kind := 'integer'] /df_only_indexables/table (Table(5,)) '' description := { "index": Int64Col(shape=(), dflt=0, pos=0), "values_block_0": Float64Col(shape=(2,), dflt=0.0, pos=1)} byteorder := 'little' chunkshape := (2730,) autoindex := True colindexes := { "index": Index(6, medium, shuffle, zlib(1)).is_csi=False} /df_only_indexables/table._v_attrs (AttributeSet), 11 attributes: [CLASS := 'TABLE', FIELD_0_FILL := 0, FIELD_0_NAME := 'index', FIELD_1_FILL := 0.0, FIELD_1_NAME := 'values_block_0', NROWS := 5, TITLE := '', VERSION := '2.7', index_kind := 'integer', values_block_0_dtype := 'float64', values_block_0_kind := ['A', 'B']] /df_with_data_columns/table (Table(5,)) '' description := { "index": Int64Col(shape=(), dflt=0, pos=0), "A": Float64Col(shape=(), dflt=0.0, pos=1), "B": Float64Col(shape=(), dflt=0.0, pos=2)} byteorder := 'little' chunkshape := (2730,) autoindex := True colindexes := { "A": Index(6, medium, shuffle, zlib(1)).is_csi=False, "index": Index(6, medium, shuffle, zlib(1)).is_csi=False, "B": Index(6, medium, shuffle, zlib(1)).is_csi=False} /df_with_data_columns/table._v_attrs (AttributeSet), 15 attributes: [A_dtype := 'float64', A_kind := ['A'], B_dtype := 'float64', B_kind := ['B'], CLASS := 'TABLE', FIELD_0_FILL := 0, FIELD_0_NAME := 'index', FIELD_1_FILL := 0.0, FIELD_1_NAME := 'A', FIELD_2_FILL := 0.0, FIELD_2_NAME := 'B', NROWS := 5, TITLE := '', VERSION := '2.7', index_kind := 'integer']
So, if you want to query a column, make it data_column . If you do not, they will be saved in blocks by dtype (faster / less space).
Usually you want to index the column to be extracted, BUT, if you create and then add several files to one store, you usually turn off index creation and do it at the end (since it is quite expensive to create as you go).
See the cookbook for the questioner.