I feel that there may be a way to do this by converting a regular expression to a grammar, putting the grammar in Chomsky’s normal form, combining common non-terminals, and looking for patterns using some heurstic comparison. You can even get more succinct answers if you don’t put it in the “real” CNF ... I would leave lambdas / epsilon inside.
ba*b|ba*c|ca*b|ca*c S -> bAb | bAc | cAb | cAc A -> aA | lambda S -> BAB | BAC | CAB | CAC A -> AA | a | lambda B -> b C -> c S -> DB | DC | EB | EC A -> AA | a | lambda B -> b C -> c D -> BA E -> CA
At this point, you may find a heuristic that recognizes
S -> (D+E)(B+C)
Backsubstituting,
S -> (BA|CA)(b|c) -> (ba*|ca*)(b|c)
Repeat this for subexpressions like
S' -> bA' | cA' A' -> aA' | lambda S' -> B'A' | C'A' A' -> A'A' | a | lambda B' -> b C' -> c
Now, given that S → (B | C) (A), we can get
S' -> (B'|C')(A') -> (b|c)(a*)
For final decision
S -> ((b|c)a*)(b|c)
Then you can just look for extra parentheses to remove (noting that the union is associative, and this will optimize things into a concatenative normal form, just drop all the parentheses that don't just enclose the list of options | -delimited ... so the above becomes
(b|c)a*(b|c)
The trick comes with heuristics, and this may not do all the optimizations that are possible. I do not know how this will work. However, this might be something to consider.
source share