Pattern in continuous sequence data

Suppose I have a list of events. For example A, D, T, H, U, A, B, F, H, ...

I need to find frequent patterns that occur in complete sequence. In this problem, we cannot use traditional algorithms, such as a priori or fp-growth, because they require separate sets of elements. And I can not break this stream into smaller sets.

Any idea which algorithm would work for me?


EDIT

For example, for the sequence A, D, T, H, U, A, D, T, H, T, H, U, A, H, T, H and min_support = 2 .

Frequent templates will be

 Of length 1 --> [A, D, T, H, U] Of length 2 --> [AD, DT, TH, HU, UA, HT] Of length 3 --> [ADT, DTH, THU, HUA] Of length 4 --> [ADTH, THUA] No sequences of length 5 and further 
+5
source share
4 answers

You can try the aho-corasick algorithm with wildcard and / or just with all substrings. Aho-corasick is basically a state machine that needs a dictionary, but it quickly finds several patterns in the search bar. You can build a state machine with trie and breadth-first search. Here is a good animation example: http://blog.ivank.net/aho-corasick-algorithm-in-as3.html . So, you basically need 2 steps: build a state machine and find the line.

+2
source

You can create all possible substrings, for example:

 A AD ADT ADTH ... D DT DTH ... 

Now the question is whether the order of the elements has smaller substrings.

If not, you can try and run the standard smart pooling algorithms.

If so, then order matters in the entire sequence and its subsequences, which makes this a problem for signal processing or time series. But even if order matters, we can continue to analyze this path with all the substrings. We can try to combine them, exact match or fuzzy match, etc.

0
source

This is a special kind of development of frequent component sets called sequential template development .

If you are looking for this topic, you will find literally dozens of algorithms.

There are GSP, SPADE, PrefixSpan and many others.

0
source

Here is a simple algorithm (in JavaScript) that will generate a counter of all substrings.

Store the number of substrings in the dictionary. Iterate over each possible substring in the stream, and if it is already in the dictionary, increase it, otherwise add it with a value of 1.

 var stream = 'FOOBARFOO'; var substrings = {}; var minimumSubstringLength = 2; for (var i = 1; i <= stream.length; i++) { for (var j = 0; j <= i - minimumSubstringLength; j++) { var substring = stream.substring(j, i); substrings[substring] ? substrings[substring]++ : substrings[substring] = 1; } } 

Then use the sorting algorithm to sort the dictionary by its values.

0
source

Source: https://habr.com/ru/post/1233954/


All Articles