Simplify complex regex

Question

Simplify complex regex

I am looking for a way to simplify a regular expression that consists of values (e.g. 12345), relationship signs (<,>, <=,> =) and junctors (& ,!). For instance. expression:

>= 12345 & <=99999 & !55555

should be matched. I have this regex:

 (^<=|^<= | ^>= | ^>= |^<|^>|^< |^> |^)((!|)([0-9]{1,5}))( & > | & < |& >=|&>=|&<=||&<=|&>=|&<|&>|&| &| & |$))*

I am particularly unhappy with the repetition of <=,> =, <,> at the beginning and end of the expression. I would be happy to receive a hint on how to make it easier, for example. look forward, look back.

+6

java regex

user1413457 May 23 '12 at 20:24

source share

6 answers

stema · Answer 1 · 2012-05-23T21:25:03+0000

Starting with your regular expression, you can take the following simplification steps:

  (^<=|^<= | ^>= | ^>= |^<|^>|^< |^> |^)((!|)([0-9]{1,5}))( & > | & < |& >=|&>=|&<=||&<=|&>=|&<|&>|&| &| & |$))*

Move anchor from alternation

 ^(<=|<= |>= |>= |<|>|< |> |)((!|)([0-9]{1,5}))( & > | & < |& >=|&>=|&<=||&<=|&>=|&<|&>|&| &| & |$))*

Why are there gaps in front of the anchor? (deleted)

Move the following spaces outside and make it optional

 ^(<=|<=|>=|>=|<|>|<|>|) ?((!|)([0-9]{1,5}))( & > | & < |& >=|&>=|&<=||&<=|&>=|&<|&>|&| &| & |$))*

Remove duplicates in alternation

 ^(<=|>=|<|>|) ?((!|)([0-9]{1,5}))( & > | & < |& >=|&>=|&<=||&<=|&>=|&<|&>|&| &| & |$))*

An empty alternative at the end will match an empty string ==> this alternation is optional

 ^((<=|>=|<|>)? ?)?((!|)([0-9]{1,5}))( & > | & < |& >=|&>=|&<=||&<=|&>=|&<|&>|&| &| & |$))*

Make the equal sign optional and remove duplicates

 ^((<|>)=? ?)?((!|)([0-9]{1,5}))( & > | & < |& >=|&>=|&<=||&<=|&>=|&<|&>|&| &| & |$))*

Single character swapping can be replaced by character class

 ^([<>]=? ?)?((!|)([0-9]{1,5}))( & > | & < |& >=|&>=|&<=||&<=|&>=|&<|&>|&| &| & |$))*

Do similar things with alternating at the end and you will get something like this:
```
 ^([<>]=? ?)?((!|)([0-9]{1,5}))( ?(& ?([<>]=?)?)?|$) 
```

This is untested, I did not change the semantics (I think so), but I did it only here in the editor.

Junuxx · Answer 2 · 2012-05-23T20:33:32+0000

You can make all spaces optional (with question marks), so you do not need to list all the options explicitly. You can also group equal / inequality characters in a character set ([]).

I think that

 (^[<>]=?\s?)((!|)([0-9]{1,5}))(\s?&\s?[<>]=?\s|$)*

kevlar1818 · Answer 3 · 2012-05-23T20:33:46+0000

What about

[<>]=?|\d{1,5}|[&!\|]

This will take care of your> / "> = / </ <= repetition. It seems to work for me.

Let me know if this answers your question or needs work.

Marko topolnik · Answer 4 · 2012-05-23T20:39:14+0000

I have a two-step procedure. First go to junctor, then check out the individual parts.

 final String expr = ">= 12345 & <=99999 & !55555".replaceAll("\\s+", ""); for (String s : expr.split("[|&]")) if (!s.matches("([<>]=?|=|!)?\\d+")) { System.out.println("Invalid"); return; } System.out.println("Valid");

But we still have no idea if you are talking about validation or something else.

andrew cooke · Answer 5 · 2012-05-23T20:50:18+0000

you seem to be spending a lot of effort on optional spaces. something like \s? (0 - 1) or \s* (0 - many) would be better.

also repeating elements separated by something are always complex. it is best to do regexp for the “thing” to make it easier to repeat.

 limit = '\s*([<>]=?|!)\s*\d{1,5}\s*' one_or_more = '^' + limit + '(&' + limit + ')*$'

or, advanced:

 ^\s*([<>]=?|!)\s*\d{1,5}\s*(&\s*([<>]=?|!)\s*\d{1,5}\s*)*$

also ! is a "sign of relationship", not a "junctor", if I understand correctly.

(for people advocating the use of a "real" parser, the above is the one_or_more structure - probably, as you end up implementing an & -segment list, there is no need for a parser if you can just use string concatenation in the language).

Paulpro · Answer 6 · 2012-05-23T20:56:45+0000

Is this what you want:

 ^(\s*([<>]=?)?\s*!?\d{1,5}\s*(&|$))*

These explanations of summing sum expressions should help you understand all of this:

\s* : 0 or more spaces
([<>]=?)? : A < or > sign, and then = , all optional
!? : And optional !
\d{1,5} : 1-5 digits
(&|$) : either & or end of line

Simplify complex regex

More articles: