Simplify complex regex

I am looking for a way to simplify a regular expression that consists of values ​​(e.g. 12345), relationship signs (<,>, <=,> =) and junctors (& ,!). For instance. expression:

>= 12345 & <=99999 & !55555 

should be matched. I have this regex:

 (^<=|^<= | ^>= | ^>= |^<|^>|^< |^> |^)((!|)([0-9]{1,5}))( & > | & < |& >=|&>=|&<=||&<=|&>=|&<|&>|&| &| & |$))* 

I am particularly unhappy with the repetition of <=,> =, <,> at the beginning and end of the expression. I would be happy to receive a hint on how to make it easier, for example. look forward, look back.

source share
6 answers

Starting with your regular expression, you can take the following simplification steps:

  (^<=|^<= | ^>= | ^>= |^<|^>|^< |^> |^)((!|)([0-9]{1,5}))( & > | & < |& >=|&>=|&<=||&<=|&>=|&<|&>|&| &| & |$))* 
  • Move anchor from alternation

     ^(<=|<= |>= |>= |<|>|< |> |)((!|)([0-9]{1,5}))( & > | & < |& >=|&>=|&<=||&<=|&>=|&<|&>|&| &| & |$))* 

    Why are there gaps in front of the anchor? (deleted)

  • Move the following spaces outside and make it optional

     ^(<=|<=|>=|>=|<|>|<|>|) ?((!|)([0-9]{1,5}))( & > | & < |& >=|&>=|&<=||&<=|&>=|&<|&>|&| &| & |$))* 
  • Remove duplicates in alternation

     ^(<=|>=|<|>|) ?((!|)([0-9]{1,5}))( & > | & < |& >=|&>=|&<=||&<=|&>=|&<|&>|&| &| & |$))* 
  • An empty alternative at the end will match an empty string ==> this alternation is optional

     ^((<=|>=|<|>)? ?)?((!|)([0-9]{1,5}))( & > | & < |& >=|&>=|&<=||&<=|&>=|&<|&>|&| &| & |$))* 
  • Make the equal sign optional and remove duplicates

     ^((<|>)=? ?)?((!|)([0-9]{1,5}))( & > | & < |& >=|&>=|&<=||&<=|&>=|&<|&>|&| &| & |$))* 
  • Single character swapping can be replaced by character class

     ^([<>]=? ?)?((!|)([0-9]{1,5}))( & > | & < |& >=|&>=|&<=||&<=|&>=|&<|&>|&| &| & |$))* 
  • Do similar things with alternating at the end and you will get something like this:

     ^([<>]=? ?)?((!|)([0-9]{1,5}))( ?(& ?([<>]=?)?)?|$) 

This is untested, I did not change the semantics (I think so), but I did it only here in the editor.


You can make all spaces optional (with question marks), so you do not need to list all the options explicitly. You can also group equal / inequality characters in a character set ([]).

I think that


What about


This will take care of your> / "> = / </ <= repetition. It seems to work for me.

Let me know if this answers your question or needs work.


I have a two-step procedure. First go to junctor, then check out the individual parts.

 final String expr = ">= 12345 & <=99999 & !55555".replaceAll("\\s+", ""); for (String s : expr.split("[|&]")) if (!s.matches("([<>]=?|=|!)?\\d+")) { System.out.println("Invalid"); return; } System.out.println("Valid"); 

But we still have no idea if you are talking about validation or something else.


you seem to be spending a lot of effort on optional spaces. something like \s? (0 - 1) or \s* (0 - many) would be better.

also repeating elements separated by something are always complex. it is best to do regexp for the β€œthing” to make it easier to repeat.

 limit = '\s*([<>]=?|!)\s*\d{1,5}\s*' one_or_more = '^' + limit + '(&' + limit + ')*$' 

or, advanced:


also ! is a "sign of relationship", not a "junctor", if I understand correctly.

(for people advocating the use of a "real" parser, the above is the one_or_more structure - probably, as you end up implementing an & -segment list, there is no need for a parser if you can just use string concatenation in the language).


Is this what you want:


These explanations of summing sum expressions should help you understand all of this:

\s* : 0 or more spaces
([<>]=?)? : A < or > sign, and then = , all optional
!? : And optional !
\d{1,5} : 1-5 digits
(&|$) : either & or end of line



All Articles