For a data frame without missing values, this will be as simple as df.diff(periods=24, axis=0)
. But how can you relate calculations to indexes?
Playable dataframe - Code:
import pandas as pd
import numpy as np
np.random.seed(123)
rows = 36
rng = pd.date_range('1/1/2017', periods=rows, freq='H')
df = pd.DataFrame(np.random.randint(100,150,size=(rows, 2)), columns=['A', 'B'])
df = df.set_index(rng)
Playable Data - Screenshot:
Desired conclusion - Code:
df = df.diff(periods=24, axis=0)
df = df.dropna(axis=0, how='all')
Desired Conclusion - Screenshot
Real challenge
The problem is that my real-world examples are full of missing values. Therefore, I will have to connect the difference intervals to the index values, and I have no idea how. I tried several solutions with filling in the missing hours in the index in the first place, and then made the differences as before, but this is not very elegant.
Thanks for any suggestions!
Edit - as pointed out in the comments, here is my best attempt for a slightly longer period of time:
df_missing = df.drop(df.index[[2,3]])
newIndex = pd.date_range(start = '1/1/2017', end = '1/3/2017', freq='H')
df_missing = df_missing.reindex(newIndex, fill_value = np.nan)
df_refilled = df_missing.diff(periods=24, axis=0)
, , =)