Simplify complex regex

I am looking for a way to simplify a regular expression that consists of values ​​(e.g. 12345), relationship signs (<,>, <=,> =) and junctors (& ,!). For instance. expression:

>= 12345 & <=99999 & !55555 

should be matched. I have this regex:

 (^<=|^<= | ^>= | ^>= |^<|^>|^< |^> |^)((!|)([0-9]{1,5}))( & > | & < |& >=|&>=|&<=||&<=|&>=|&<|&>|&| &| & |$))* 

I am particularly unhappy with the repetition of <=,> =, <,> at the beginning and end of the expression. I would be happy to receive a hint on how to make it easier, for example. look forward, look back.

+6
source share
6 answers

Starting with your regular expression, you can take the following simplification steps:

  (^<=|^<= | ^>= | ^>= |^<|^>|^< |^> |^)((!|)([0-9]{1,5}))( & > | & < |& >=|&>=|&<=||&<=|&>=|&<|&>|&| &| & |$))* 
  • Move anchor from alternation

     ^(<=|<= |>= |>= |<|>|< |> |)((!|)([0-9]{1,5}))( & > | & < |& >=|&>=|&<=||&<=|&>=|&<|&>|&| &| & |$))* 

    Why are there gaps in front of the anchor? (deleted)

  • Move the following spaces outside and make it optional

     ^(<=|<=|>=|>=|<|>|<|>|) ?((!|)([0-9]{1,5}))( & > | & < |& >=|&>=|&<=||&<=|&>=|&<|&>|&| &| & |$))* 
  • Remove duplicates in alternation

     ^(<=|>=|<|>|) ?((!|)([0-9]{1,5}))( & > | & < |& >=|&>=|&<=||&<=|&>=|&<|&>|&| &| & |$))* 
  • An empty alternative at the end will match an empty string ==> this alternation is optional

     ^((<=|>=|<|>)? ?)?((!|)([0-9]{1,5}))( & > | & < |& >=|&>=|&<=||&<=|&>=|&<|&>|&| &| & |$))* 
  • Make the equal sign optional and remove duplicates

     ^((<|>)=? ?)?((!|)([0-9]{1,5}))( & > | & < |& >=|&>=|&<=||&<=|&>=|&<|&>|&| &| & |$))* 
  • Single character swapping can be replaced by character class

     ^([<>]=? ?)?((!|)([0-9]{1,5}))( & > | & < |& >=|&>=|&<=||&<=|&>=|&<|&>|&| &| & |$))* 
  • Do similar things with alternating at the end and you will get something like this:

     ^([<>]=? ?)?((!|)([0-9]{1,5}))( ?(& ?([<>]=?)?)?|$) 

This is untested, I did not change the semantics (I think so), but I did it only here in the editor.

+1
source

You can make all spaces optional (with question marks), so you do not need to list all the options explicitly. You can also group equal / inequality characters in a character set ([]).

I think that

 (^[<>]=?\s?)((!|)([0-9]{1,5}))(\s?&\s?[<>]=?\s|$)* 
0
source

What about

[<>]=?|\d{1,5}|[&!\|]

This will take care of your> / "> = / </ <= repetition. It seems to work for me.

Let me know if this answers your question or needs work.

0
source

I have a two-step procedure. First go to junctor, then check out the individual parts.

 final String expr = ">= 12345 & <=99999 & !55555".replaceAll("\\s+", ""); for (String s : expr.split("[|&]")) if (!s.matches("([<>]=?|=|!)?\\d+")) { System.out.println("Invalid"); return; } System.out.println("Valid"); 

But we still have no idea if you are talking about validation or something else.

0
source

you seem to be spending a lot of effort on optional spaces. something like \s? (0 - 1) or \s* (0 - many) would be better.

also repeating elements separated by something are always complex. it is best to do regexp for the β€œthing” to make it easier to repeat.

 limit = '\s*([<>]=?|!)\s*\d{1,5}\s*' one_or_more = '^' + limit + '(&' + limit + ')*$' 

or, advanced:

 ^\s*([<>]=?|!)\s*\d{1,5}\s*(&\s*([<>]=?|!)\s*\d{1,5}\s*)*$ 

also ! is a "sign of relationship", not a "junctor", if I understand correctly.

(for people advocating the use of a "real" parser, the above is the one_or_more structure - probably, as you end up implementing an & -segment list, there is no need for a parser if you can just use string concatenation in the language).

0
source

Is this what you want:

 ^(\s*([<>]=?)?\s*!?\d{1,5}\s*(&|$))* 

These explanations of summing sum expressions should help you understand all of this:

\s* : 0 or more spaces
([<>]=?)? : A < or > sign, and then = , all optional
!? : And optional !
\d{1,5} : 1-5 digits
(&|$) : either & or end of line

0
source

Source: https://habr.com/ru/post/916492/


All Articles