"ValueError: cannot be re-indexed from the duplicated axis"

I have the following df:

Timestamp ABC ... 2014-11-09 00:00:00 NaN 1 NaN NaN 2014-11-09 00:00:00 2 NaN NaN NaN 2014-11-09 00:00:00 NaN NaN 3 NaN 2014-11-09 08:24:00 NaN NaN 1 NaN 2014-11-09 08:24:00 105 NaN NaN NaN 2014-11-09 09:19:00 NaN NaN 23 NaN 

And I would like to do the following:

 Timestamp ABC ... 2014-11-09 00:00:00 2 1 3 NaN 2014-11-09 00:01:00 NaN NaN NaN NaN 2014-11-09 00:02:00 NaN NaN NaN NaN ... NaN NaN NaN NaN 2014-11-09 08:23:00 NaN NaN NaN NaN 2014-11-09 08:24:00 105 NaN 1 NaN 2014-11-09 08:25:00 NaN NaN NaN NaN 2014-11-09 08:26:00 NaN NaN NaN NaN 2014-11-09 08:27:00 NaN NaN NaN NaN ... NaN NaN NaN NaN 2014-11-09 09:18:00 NaN NaN NaN NaN 2014-11-09 09:19:00 NaN NaN 23 NaN 

That is: I would like to combine columns with the same timestamp (I have 17 columns), resample with 1 minute grain size and for those columns without values ​​that I would like to have NaN.

I started as follows:

 df.groupby('Timestamp').sum() 

and

 df = df.resample('1Min', how='max') 

but I got the following error:

 ValueError: cannot reindex from a duplicate axis 

How can I solve this problem? I am just learning Python, so I have no experience at all.

Thanks!

+6
source share
1 answer

Suppose you have your own Timestamp index as an index, first you need to re-select and reset_index before doing groupby , here's a working example:

 import pandas as pd df ABC ... Timestamp 2014-11-09 00:00:00 NaN 1 NaN NaN 2014-11-09 00:00:00 2 NaN NaN NaN 2014-11-09 00:00:00 NaN NaN 3 NaN 2014-11-09 08:24:00 NaN NaN 1 NaN 2014-11-09 08:24:00 105 NaN NaN NaN 2014-11-09 09:19:00 NaN NaN 23 NaN df.resample('1Min', how='max').reset_index().groupby('Timestamp').sum() ABC ... Timestamp 2014-11-09 00:00:00 2 1 3 NaN 2014-11-09 00:01:00 NaN NaN NaN NaN 2014-11-09 00:02:00 NaN NaN NaN NaN 2014-11-09 00:03:00 NaN NaN NaN NaN 2014-11-09 00:04:00 NaN NaN NaN NaN ... 2014-11-09 09:17:00 NaN NaN NaN NaN 2014-11-09 09:18:00 NaN NaN NaN NaN 2014-11-09 09:19:00 NaN NaN 23 NaN 

Hope this helps.

Updated:

As the comment said, your “timestamp” is not a datetime and probably like a string, so you cannot drag and drop DatetimeIndex, just reset_index and convert it something like this:

 df = df.reset_index() df['ts'] = pd.to_datetime(df['Timestamp']) # 'ts' is now datetime of 'Timestamp', you just need to set it to index df = df.set_index('ts') ... 

Now just run the previous code, but replace “Timestamp” with “ts” and you should be fine.

+5
source

Source: https://habr.com/ru/post/980325/


All Articles