How to find the difference between the hour of the day of individual days in a pandas data frame?

For a data frame without missing values, this will be as simple as df.diff(periods=24, axis=0). But how can you relate calculations to indexes?


Playable dataframe - Code:

# Imports
import pandas as pd
import numpy as np

# A dataframe with two variables, random numbers and hourly time series
np.random.seed(123)
rows = 36
rng = pd.date_range('1/1/2017', periods=rows, freq='H')
df = pd.DataFrame(np.random.randint(100,150,size=(rows, 2)), columns=['A', 'B']) 
df = df.set_index(rng)

Playable Data - Screenshot:

enter image description here

Desired conclusion - Code:

# Running difference step = 24
df = df.diff(periods=24, axis=0)
df = df.dropna(axis=0, how='all')

Desired Conclusion - Screenshot

enter image description here

Real challenge

The problem is that my real-world examples are full of missing values. Therefore, I will have to connect the difference intervals to the index values, and I have no idea how. I tried several solutions with filling in the missing hours in the index in the first place, and then made the differences as before, but this is not very elegant.

Thanks for any suggestions!

Edit - as pointed out in the comments, here is my best attempt for a slightly longer period of time:

df_missing = df.drop(df.index[[2,3]])
newIndex = pd.date_range(start = '1/1/2017',  end = '1/3/2017', freq='H')
df_missing = df_missing.reindex(newIndex, fill_value = np.nan)
df_refilled = df_missing.diff(periods=24, axis=0)

, , =)

+4
2

, , groupby

df.groupby(df.index.hour).diff().dropna()
Out[784]: 
                        A     B
2017-01-02 00:00:00  -3.0   3.0
2017-01-02 01:00:00 -28.0 -23.0
2017-01-02 02:00:00  -4.0  -7.0
2017-01-02 03:00:00   3.0 -29.0
2017-01-02 04:00:00  -4.0   3.0
2017-01-02 05:00:00 -17.0  -6.0
2017-01-02 06:00:00 -20.0  35.0
2017-01-02 07:00:00  -2.0 -40.0
2017-01-02 08:00:00  13.0 -21.0
2017-01-02 09:00:00  -9.0 -13.0
2017-01-02 10:00:00   0.0   3.0
2017-01-02 11:00:00 -21.0  -9.0
+4

asfreq, diff?

df.asfreq('1H').diff(periods=24, axis=0).dropna()

shift, ( diff),

v = df.asfreq('1h') 
(v - v.shift(periods=24)).dropna()

                        A     B
2017-01-02 00:00:00  -3.0   3.0
2017-01-02 01:00:00 -28.0 -23.0
2017-01-02 02:00:00  -4.0  -7.0
2017-01-02 03:00:00   3.0 -29.0
2017-01-02 04:00:00  -4.0   3.0
2017-01-02 05:00:00 -17.0  -6.0
2017-01-02 06:00:00 -20.0  35.0
2017-01-02 07:00:00  -2.0 -40.0
2017-01-02 08:00:00  13.0 -21.0
2017-01-02 09:00:00  -9.0 -13.0
2017-01-02 10:00:00   0.0   3.0
2017-01-02 11:00:00 -21.0  -9.0
+5

Source: https://habr.com/ru/post/1692728/


All Articles