I coined the term "cycle", which rolls with the hope that it does not overlap with the existing term. Basically, I'm trying to come up with an algorithm for finding loops in printed text.
Some examples from simple to complex
Example 1
Given:
aaaaabcd
I want to say:
5x(a) bcd
or algorithmically:
for 1 .. 5 print a end print b print c print d
Example 2
Given:
ababababcd
I want to say:
4x(ab) cd
or algorithmically:
for 1 .. 4 print a print b end print c print d
Example 3
Given:
abcdbcdbcdbce
I want to say:
a 3x(bcd) bce
or algorithmically:
print a for 1 .. 3 print b print c print d end print b print c print d
This did not remind me of any algorithm that I know of. I feel that some of the problems may be ambiguous, but finding one of the solutions is enough for me now. Efficiency is always welcome, but not necessary. How can i do this?
EDIT
First of all, thanks for all the discussions. I adapted the LZW algorithm from rosetta and ran it on my input:
abcdbcdbcdbcdef
who gave me:
a b c d 8 => bc 10 => db 9 => cd 11 => bcd e f
where i have a dictionary:
aa cc bb ee dd ff 8 bc 9 cd 10 db 11 bcd 12 dbc 13 cdb 14 bcde 15 ef 7 ab
It works well for compression, but thatβs not quite what I wanted. What I need is more like compression in the algorithmic representation of my examples, which will have:
- subsequent sequences (if the sequence repeats, there would be no other sequence between them)
- no dictionary, but only loops
- irreducable
- with maximum sequence sizes (which minimizes the algorithmic representation)
- and let nested loops be valid (contrary to what I said earlier in the comment)
source share