Consider the following hdfstore and dataframes df and df2
import pandas as pd store = pd.HDFStore('test.h5')
midx = pd.MultiIndex.from_product([range(2), list('XYZ')], names=list('AB')) df = pd.DataFrame(dict(C=range(6)), midx) df C AB 0 X 0 Y 1 Z 2 1 X 3 Y 4 Z 5
midx2 = pd.MultiIndex.from_product([range(2), list('VWX')], names=list('AB')) df2 = pd.DataFrame(dict(C=range(6)), midx2) df2 C AB 0 V 0 W 1 X 2 1 V 3 W 4 X 5
I want to write df to the repository first.
store.append('df', df) store.get('df') C AB 0 X 0 Y 1 Z 2 1 X 3 Y 4 Z 5
At a later point in time, I will have another data frame that I want to update in the store. I want to rewrite rows with the same index values as in my new data framework, keeping the old ones.
When i do
store.append('df', df2) store.get('df') C AB 0 X 0 Y 1 Z 2 1 X 3 Y 4 Z 5 0 V 0 W 1 X 2 1 V 3 W 4 X 5
This is not at all what I want. Note that (0, 'X') and (1, 'X') repeated. I can manipulate the combined data framework and overwrite, but I expect to work with more data if this is not possible.
How to upgrade storage to get?
C AB 0 V 0 W 1 X 2 Y 1 Z 2 1 V 3 W 4 X 5 Y 4 Z 5
You will see that for each level, 'A' , 'Y' and ' Z' same, 'V' and 'W' are new, and 'X' updated.
What is the right way to do this?