Pandas align multiple data frames with a TimeStamp index

This has been the scourge of my life in the last couple of days. I have numerous Pandas Dataframes that contain time series data with irregular frequencies. I am trying to bring them into a single data block.

Below is the code with representative data frames df1 , df2 and df3 (I actually have n = 5, and I would appreciate a solution that would work for all n>2 ):

 # df1, df2, df3 are given at the bottom import pandas as pd import datetime # I can align df1 to df2 easily df1aligned, df2aligned = df1.align(df2) # And then concatenate into a single dataframe combined_1_n_2 = pd.concat([df1aligned, df2aligned], axis =1 ) # Since I don't know any better, I then try to align df3 to combined_1_n_2 manually: combined_1_n_2.align(df3) error: Reindexing only valid with uniquely valued Index objects 

I have an idea why I get this error, so I get rid of duplicate indexes in combined_1_n_2 and try again:

 combined_1_n_2 = combined_1_n_2.groupby(combined_1_n_2.index).first() combined_1_n_2.align(df3) # But stll get the same error error: Reindexing only valid with uniquely valued Index objects 

Why am I getting this error? Even if it worked, it is completely tame and ugly. How can I align> 2 time series and combine them into a single framework?

Data:

 df1 = pd.DataFrame( {'price' : [62.1250,62.2500,62.2375,61.9250,61.9125 ]}, index = [pd.DatetimeIndex([datetime.datetime.strptime(s, '%Y-%m-%d %H:%M:%S.%f')])[0] for s in ['2008-06-01 06:03:59.614000', '2008-06-01 06:03:59.692000', '2008-06-01 06:15:42.004000', '2008-06-01 06:15:42.083000','2008-06-01 06:17:01.654000' ] ]) df2 = pd.DataFrame({'price': [241.0625, 241.5000, 241.3750, 241.2500, 241.3750 ]}, index = [pd.DatetimeIndex([datetime.datetime.strptime(s, '%Y-%m-%d %H:%M:%S.%f')])[0] for s in ['2008-06-01 06:13:34.524000', '2008-06-01 06:13:34.602000', '2008-06-01 06:15:05.399000', '2008-06-01 06:15:05.399000','2008-06-01 06:15:42.082000' ] ]) df3 = pd.DataFrame({'price': [67.656, 67.875, 67.8125, 67.75, 67.6875 ]}, index = [pd.DatetimeIndex([datetime.datetime.strptime(s, '%Y-%m-%d %H:%M:%S.%f')])[0] for s in ['2008-06-01 06:03:52.281000', '2008-06-01 06:03:52.359000', '2008-06-01 06:13:34.848000', '2008-06-01 06:13:34.926000','2008-06-01 06:15:05.321000' ] ]) 
+5
source share
1 answer

Your specific error is related to the column names combined_1_n_2 with duplicates (both columns will be called "price"). You can rename the columns and the second result will work.

One alternative way would be to chain a join statement that joins frames in an index, as shown below.

 In [23]: df1.join(df2, how='outer', rsuffix='_1').join(df3, how='outer', rsuffix='_2') Out[23]: price price_1 price_2 2008-06-01 06:03:52.281000 NaN NaN 67.6560 2008-06-01 06:03:52.359000 NaN NaN 67.8750 2008-06-01 06:03:59.614000 62.1250 NaN NaN 2008-06-01 06:03:59.692000 62.2500 NaN NaN 2008-06-01 06:13:34.524000 NaN 241.0625 NaN 2008-06-01 06:13:34.602000 NaN 241.5000 NaN 2008-06-01 06:13:34.848000 NaN NaN 67.8125 2008-06-01 06:13:34.926000 NaN NaN 67.7500 2008-06-01 06:15:05.321000 NaN NaN 67.6875 2008-06-01 06:15:05.399000 NaN 241.3750 NaN 2008-06-01 06:15:05.399000 NaN 241.2500 NaN 2008-06-01 06:15:42.004000 62.2375 NaN NaN 2008-06-01 06:15:42.082000 NaN 241.3750 NaN 2008-06-01 06:15:42.083000 61.9250 NaN NaN 2008-06-01 06:17:01.654000 61.9125 NaN NaN 
+3
source

Source: https://habr.com/ru/post/1204664/


All Articles