Slow performance of timedelta methods

Why .dt.daysdoes it take 100 times more than .dt.total_seconds()?

df = pd.DataFrame({'a': pd.date_range('2011-01-01 00:00:00', periods=1000000, freq='1H')})
df.a = df.a - pd.to_datetime('2011-01-01 00:00:00')
df.a.dt.days # 12 sec
df.a.dt.total_seconds() # 0.14 sec
+4
source share
1 answer

.dt.total_seconds is basically just multiplication and can be run at numpythonic speed:

def total_seconds(self):
    """
    Total duration of each element expressed in seconds.

    .. versionadded:: 0.17.0
    """
    return self._maybe_mask_results(1e-9 * self.asi8)

If we interrupted the operation days, we will see that it spends its time in the slow list using getattr and constructing Timedelta ( source ) objects :

    360         else:
    361             result = np.array([getattr(Timedelta(val), m)
--> 362                                for val in values], dtype='int64')
    363         return result
    364 

This shouts to me: “Look, let's get it right, and we will move on to the optimization bridge when we get to it.”

+3
source

Source: https://habr.com/ru/post/1651135/


All Articles