Detecting a priori unknown pattern in a dataset

Are there any algorithms / tools for detecting an a priori unknown pattern in the input sequence of discrete characters?

For example, for the string "01001000100001" it is something like ("0" ^ i "1"), and for "01001100011100001111" it is like ("0" ^ i "1" ^ i)

I have found several approaches, but they are applied when the set of patterns for detection in a sequence is known a priori. I also found a sequitur algorithm to detect the hierarchical structure in the data, but it does not work when the sequence is like an "arithmetic progression", as in my examples.

Therefore, I will be very grateful for any information about the method / algorithm / tool / scientific article.

+4
source share
1 answer

I believe that, as someone remarked, the general case is not decidable. Douglas Hofstadder spent a lot of time studying this problem and describes some approaches (some automated, some manuals), see the first chapter:

http://www.amazon.com/Fluid-Concepts-And-Creative-Analogies/dp/0465024750

, , AI ( ). (, i/2 ) , , , ( s), , , . , ( ).

, .

0

Source: https://habr.com/ru/post/1529641/


All Articles