The summation / stack algorithm of values from the time series graph, where the data points do not coincide in time

Question

The summation / stack algorithm of values from the time series graph, where the data points do not coincide in time

I have a problem with graphics / analysis in which I cannot find myself. I can do brute force, but its too slow, maybe someone has a better idea or knows or a speedy library for python?

I have 2+ time series datasets (x, y) that I want to combine (and subsequently build). The problem is that the x values in the series do not match, and I really don't want to resort to duplicating the values in time boxes.

So, considering these two series:

S1: (1;100) (5;100) (10;100) S2: (4;150) (5;100) (18;150)

When added together, it should turn out:

 ST: (1;100) (4;250) (5;200) (10;200) (18;250)

Logics:

 x=1 s1=100, s2=None, sum=100 x=4 s1=100, s2=150, sum=250 (note s1 value from previous value) x=5 s1=100, s2=100, sum=200 x=10 s1=100, s2=100, sum=200 x=18 s1=100, s2=150, sum=250

My current thinking is to iterate over the sorted list of keys (x), keep the previous value for each series, and query each set if it has a new y for x.

Any ideas would be appreciated!

+4

python graph analysis aggregate-functions data-analysis

Duncan Dec 21 '10 at 7:20

source share

3 answers

Something like that:

 def join_series(s1, s2): S1 = iter(s1) S2 = iter(s2) value1 = 0 value2 = 0 time1, next1 = next(S1) time2, next2 = next(S2) end1 = False end2 = False while True: time = min(time1, time2) if time == time1: value1 = next1 try: time1, next1 = next(S1) except StopIteration: end1 = True time1 = time2 if time == time2: value2 = next2 try: time2, next2 = next(S2) except StopIteration: end2 = True time2 = time1 yield time, value1 + value2 if end1 and end2: raise StopIteration S1 = ((1, 100), (5, 100), (10, 100)) S2 = ((4, 150), (5, 100), (18, 150)) for result in join_series(S1, S2): print(result)

Basically, it saves the current value of S1 and S2 along with the next S1 and S2 and passes through them, on the basis of which it has the lowest "wait time". Must process lists of different lengths and use iterators so that it can process massive data, etc. Etc.

+1

Lennart Regebro Dec 21 '10 at 8:46

source share

One possible approach:

Format all elements of the series in tuples (x, y, row id), for example. (4, 150, 1) and add them to the list of tuples and sort them in ascending x.
Declare a list with a length equal to the number of rows to maintain the "last seen" value for each series.
Iterate through each element of the list in step (1) and:
3.1 Update the "last time" list according to the series identifier in the tuple
3.2 If the x of the previously iterated tuple does not match the x of the current tuple, summarize the entire element of the "last seen" list and add the result to the final list.

Now with my dirty test:

 >>> S1 = ((1, 100), (5, 100), (10, 100)) S2 = ((4, 150), (5, 100), (18, 150)) >>> all = [] >>> for s in S1: all.append((s[0], s[1], 0)) ... >>> for s in S2: all.appned((s[0], s[1], 1)) ... >>> all [(1, 100, 0), (5, 100, 0), (10, 100, 0), (4, 150, 1), (5, 100, 1), (18, 150, 1)] >>> all.sort() >>> all [(1, 100, 0), (4, 150, 1), (5, 100, 0), (5, 100, 1), (10, 100, 0), (18, 150, 1)] >>> last_val = [0]*2 >>> last_x = all[0][0] >>> final = [] >>> for e in all: ... if e[0] != last_x: ... final.append((last_x, sum(last_val))) ... last_val[e[2]] = e[1] ... last_x = e[0] ... >>> final.append((last_x, sum(last_val))) >>> final [(1, 100), (4, 250), (5, 200), (10, 200), (18, 250)] >>>

+1

Gant Dec 21 '10 at 9:33

source share

jerith · Accepted Answer · 2010-12-21T09:53:05+0000

Here is another way to do this by adding more behavior to individual data streams:

 class DataStream(object): def __init__(self, iterable): self.iterable = iter(iterable) self.next_item = (None, 0) self.next_x = None self.current_y = 0 self.next() def next(self): if self.next_item is None: raise StopIteration() self.current_y = self.next_item[1] try: self.next_item = self.iterable.next() self.next_x = self.next_item[0] except StopIteration: self.next_item = None self.next_x = None return self.next_item def __iter__(self): return self class MergedDataStream(object): def __init__(self, *iterables): self.streams = [DataStream(i) for i in iterables] self.outseq = [] def next(self): xs = [stream.next_x for stream in self.streams if stream.next_x is not None] if not xs: raise StopIteration() next_x = min(xs) current_y = 0 for stream in self.streams: if stream.next_x == next_x: stream.next() current_y += stream.current_y self.outseq.append((next_x, current_y)) return self.outseq[-1] def __iter__(self): return self if __name__ == '__main__': seqs = [ [(1, 100), (5, 100), (10, 100)], [(4, 150), (5, 100), (18, 150)], ] sm = MergedDataStream(*seqs) for x, y in sm: print "%02s: %s" % (x, y) print sm.outseq

The summation / stack algorithm of values ​​from the time series graph, where the data points do not coincide in time

More articles:

The summation / stack algorithm of values from the time series graph, where the data points do not coincide in time