Partial Regular Expression Matching

In the NFA, it is easy to make all previously non-final states accepted so that they correspond to the language of all substrings of a given language.

In the Java regex engine, is there a way to find out if a string is a source substring of a string that matches a given regular expression?

regexX = "any beginning", regexA - any given regular expression

"regexXregexA" the resulting expression matches all substring matches of "regexA":

Example:

regexA = a*b 

"a" matches

 "regexXa*b" 

because it is the beginning of "ab" (and "aab")
edit:

Since some people still do not understand, here is a software test for this question:

 import java.util.regex.*; public class Test1 { public static void main(String args[]){ String regex = "a*b"; System.out.println( partialMatch(regex, "aaa"); ); } public boolean partialMatch(String regex, String begining){ //return true if there is a string which matches the regex and //startsWith(but not equal) begining, false otherwise } } 

The results are true.

+5
source share
2 answers

What you are looking for is called a partial match, and it is actually supported by the Java regex API (for writing, other engines that offer this feature include PCRE and boost :: regex).

You can determine if the input string has been partially entered by checking the result of the Matcher.hitEnd function, which indicates whether the match matches because the end of the input string has been reached.

 Pattern pattern = Pattern.compile("a*b"); Matcher matcher = pattern.matcher("aaa"); System.out.println("Matches: " + matcher.matches()); System.out.println("Partial match: " + matcher.hitEnd()); 

It is output:

 Matches: false Partial match: true 
+10
source

In the NFA, it is easy to make all previously non-final states accepted so that they correspond to the language of all substrings of a given language.

Indeed, this can be achieved by adding a new final state and ε-displacement from each state (final or non-final) to the new final state.

Afaik for this operation there is no equivalent regular expression.

It is possible that some regex libraries provide the ability to check if a string is an incomplete regex match, I don't know. I do not know Java, I work mainly in PHP and does not provide such a function. There may be libraries that do this, but I never needed this.

For a small specific regular expression, you can try to create a new regular expression that matches the lines that will partially match the original regular expression by combining these simple rules:

  • aa?
  • abab?
  • a*a*
  • a+a*
  • a|b(a|b)?
  • etc.

a and b above are sub-modes of the original regular expression. Use parentheses if necessary.

+3
source

Source: https://habr.com/ru/post/1263888/


All Articles