Attach two timelines / list of tuples

I have a time frame / time series that consists of a list of two tuples, where the first part of the tuple is a timestamp and the second part is a value. Tuples are sorted by their label.

Now I have two of these time frames and I need to divide them into each other. This means that if I got the values ​​in both timelines for the same timestamp, I need to separate them. If there is no value on a timestamp in any timeline, you should accept 0. If (and only if) division by zero occurs, you should assume NaN. Timestamps have large spaces, which means that iterating from min (timestamp) to max (timestamp) is not a solution.

I built a solution that is both very dissimilar and has a bad run time. With a timeline of around a million records, performance is important to me. My solution is not profitable that both lists are sorted.

Is there a better solution, if so, what?

#!/usr/bin/env python l1 = [(1, 100), (2, 1000), (4, 1500), (5, 5400), (7, 7800)] l2 = [(1, 20), (2, 400), (3, 240), (4, 500), (5, 100), (6, 27), ] ex = [(1, 5), (2, 2), (3, 0), (4, 3), (5, 54), (6, 0), (7, float('NaN'))] def f(l1, l2): #Turn to dicts: l1d = dict(l1) l2d = dict(l2) #Compute Keyspace keys = set(l1d.keys()).union(set(l2d.keys())) result = [] for key in keys: if not key in l2d: result.append((key, float('NaN'))) elif key not in l1d: result.append((key, 0)) else: result.append((key, l1d[key]/l2d[key])) return result r = f(l1, l2) print("L1: %s" % (l1)) print("L2: %s" % (l2)) print("") print("Expected: %s" % (ex)) print("Result: %s" % (r)) 
+4
source share
1 answer

If performance is needed, check out pandas :

 import pandas as pd l1 = [(1, 100), (2, 1000), (4, 1500), (5, 5400), (7, 7800)] l2 = [(1, 20), (2, 400), (3, 240), (4, 500), (5, 100), (6, 27), ] s1 = pd.Series(dict(l1)) s2 = pd.Series(dict(l2)) 

now a very explicit mathematical operation:

 s1 / s2 

returns

 1 5.0 2 2.5 3 NaN 4 3.0 5 54.0 6 NaN 7 NaN 

If you want to replace NaN zeros, if they are in l2 :

 s1.reindex(s1.index|s2.index).fillna(0) / s2 1 5.0 2 2.5 3 0.0 4 3.0 5 54.0 6 0.0 7 NaN 

Works well for millions of records. You can use datetimes in the index and work with datetimecally.

+3
source

Source: https://habr.com/ru/post/1487247/


All Articles