Does java support regexp if-then-else constructors (Perl constructs)?

I get a PatternSyntaxException when I try to compile the following regex:

"bd".matches("(a)?b(?(1)c|d)") 

this regular expression matches bd and abc. It does not match bc.

any ideas? thanks.

ok I need to write a regular expression to match the following 4 lines:

 *date date* date date1*date2 

should not match:

 *date* date1*date2* *date1*date2 date** ... 

but this should be done with one coincidence, and not with several.

Please do not send an answer like:

 (date*date)|(*date)|(date*)|(date) 
+6
source share
6 answers

Adding a new answer based on editing and OP samples:

ok i need to write regex to match next 4 strings:
*date date* date date1*date2
should not match:
*date* date1*date2* *date1*date2 date** ...

If I think I understand you, you can use a regex based on Alan Moore's pseudo-dependent trick.

Maybe something like ^(?:[*]())?date(?:(?!\1)[*](?:date)?|)$ .
I assume that β€œdate” is the only text in the samples, and each group of non-spatial characters in the patterns represents different lines of text.

In the text that goes through, there is only one form that requires a pseudoword. This is the date "date." So, I included the Perl sample below (since I don't have a Java compiler), which expands the regular expression for clarity.

 use strict; use warnings; my @samps = qw( *date date* date date*date *date* date*date* *date*date date** ); for my $str (@samps) { print "\n'$str'\n"; if ($str =~ / ^ # Begin of string (?: # Expr grouping [*]() # Asterisk found then DEFINE capture group 1 as empty string )? # End expr group, optional, if asterisk NOT found, capture group 1 stays UNDEFined date # 'data' (?: # Expr grouping (?!\1) # Pseudo conditional: If no asterisk (group 1 is UNDEF), then [*](?:date)? # look for '*' folowed by optional 'data' | # OR, ) # Asterisk or not, should be nothing here $ # End of string /x) { print "matched: '$str'\n"; } } 

Exit:

 '*date' matched: '*date' 'date*' matched: 'date*' 'date' matched: 'date' 'date*date' matched: 'date*date' '*date*' 'date*date*' '*date*date' 'date**' 
+2
source

Imagine if you can use a language that does not need an else , but you want to imitate it. Instead of writing

 if (condition) { yes part } else { no part } 

You need to write

 if (condition) { yes part } if (!condition) { no part } 

Well, here is what you need to do here, but in the template. What you do in Java without conventions, you repeat the condition, but deny it, in the ELSE block, which is actually an OR block.

So, for example, instead of writing this in Perl with conditional support in a template:

 # definition of \b using a conditional in the pattern like Perl # (?(?<= \w) # if there is a word character to the left (?! \w) # then there must be no word character to the right | (?= \w) # else there must be a word character to the right ) 

In Java, write:

 # definition of \b using a duplicated condition like Java # (?: (?<= \w) # if there is a word character to the left (?! \w) # then there must be no word character to the right | # ...otherwise... (?<! \w) # if there is no word character to the left (?= \w) # then there must be a word character to the right ) 

You can recognize this as a definition of \b . Here, then, it is similar for the definition of \b s, first using the conventions:

 # definition of \B using a conditional in the pattern like Perl # (?(?<= \w) # if there is a word character to the left (?= \w) # then there must be a word character to the right | (?! \w) # else there must be no word character to the right ) 

And now, repeating the (now denied) condition in the OR branch:

 # definition of \b using a duplicated condition like Java # (?: (?<= \w) # if there is a word character to the left (?! \w) # then there must be no word character to the right | # ...otherwise... (?<! \w) # if there is no word character to the left (?= \w) # then there must be a word character to the right ) 

Note how it doesn't matter how you roll them, that the corresponding definitions of \b and \b same for defining only \w , never on \w , not to mention \s .

The ability to use conditional expressions not only allows you to type, but also reduces the likelihood of misuse. They can also be cases where you do not want to evaluate the condition twice.

Here I use this to define a few regular expression routines that the Greek atom and its boundaries provide to me:

 (?(DEFINE) (?<greeklish> [\p{Greek}\p{Inherited}] ) (?<ungreeklish> [^\p{Greek}\p{Inherited}] ) (?<greek_boundary> (?(?<= (?&greeklish)) (?! (?&greeklish)) | (?= (?&greeklish)) ) ) (?<greek_nonboundary> (?(?<= (?&greeklish)) (?= (?&greeklish)) | (?! (?&greeklish)) ) ) ) 

Note that borders and non-borders are used only (&?greeklish) , never (?&ungreeklish) ? You never need anything but borders. Instead, you do not put in your images, as \b and \b .

Although in Perl it is probably easier (albeit less general) to simply define a new, custom property \p{IsGreeklish} (and therefore its complement \p{IsGreeklish} ):

  sub IsGreeklish { return <<'END'; +utf8::IsGreek +utf8::IsInherited END } 

You will not be able to translate any of them into Java, although not so much because of insufficient Javas support for conditional expressions, but because its template language does not allow you to execute calls (DEFINE) or regular expression routines like (?&greeklish) - and indeed, your templates cannot even be written in Java. You also cannot define custom properties in Java, such as \p{IsGreeklish} .

And, of course, conditional expressions in Perl-regular expressions can be more than search strings: they can even be blocks of code to execute, so you certainly do not want the same condition to evaluate them twice, so as not to have side effects. This does not apply to Java because it cannot do this. You cannot mix patterns and code that limits you more than you might think before getting used to it.

There are so many things you can do with the Perl regex engine that you can do in no other language, and these are just a few. Not surprisingly, the significantly expanded Regexes chapter in the new 4th edition of Perl programming, combined with the completely rewritten Unicode chapter, which now immediately follows the Regexes chapter (which has been promoted to part of the internal kernel), has a combined page count of something like 130 pages, so double the length of the old chapter according to the template compared to the 3rd edition.

What you just saw above is part of the new 4th edition, due to be published next month or so.

+7
source

Java does not support conventions, but there is a trick that you can use in your place. Check this:

 String[] test = { "abc", "abd", "bc", "bd", "ad", "ac" }; for (String s : test) { System.out.printf("%-4s: %b%n", s, s.matches("(?:a())?b(\\1c|(?!\\1)d)")); } 

exit:

 abc : true abd : false bc : false bd : true ad : false ac : false 

If the line does not start with a , the first capture group does not participate in the match, and the backward link \1 fails, just like (1) in your conditional group. Otherwise, it matches an empty string, like a group.

Another aspect of the conditional is that it performs an exclusive OR; if the condition is true, the second branch should not succeed (therefore, abd should not match). Negative feedback in the second branch achieves this.

This trick works in almost all popular Perl versions, including Java, .NET, Python, PHP (PCRE) and Ruby (Oniguruma). It does not work in ECMAScript implementations such as JavaScript and ActionScript.


EDIT: Well, you added a few lines of examples, and @sln showed how to match them with pseudowords, but I wonder if you really need them. Your "valid" lines, apparently, consist of at least one date marked with no more than one * , which can be expressed as

 ^\*date|date(?:\*(?:date)?)?$ 

Here's a demo that includes the @sln regex as well as mine.

+3
source

It is very unlikely that you will not be able to continue working without this facility. Hope you don't fall into the general trap of trying to compress a lot of functionality into one regex?

Please describe your problem. I am sure that there is a better option than using an external library to implement the solution that you developed.

+1
source

Reading Specification Java 1.5 Pattern , Java 1.6 Specification Pattern , and Java 7 spec , it does not have an if-then-else construct.

An explanation of the regular expression in the question and (some different options for other languages ​​that do not support conditional expressions) can be found in this blog post . A full explanation (and further confirmation that it is not supported by Java) can be found on this page.

You can search for a third-party library for pattern matching, but it will not be integrated with the String class.

0
source

According to the wikipedia article here , in the engine comparison table, java does not make conventions.

0
source

Source: https://habr.com/ru/post/901163/


All Articles