Java regex for extracting text sequences over multiple lines

Given text exposure for example

Preface (optional, up to multiple lines)
Main : sequence1
   sequence2
   sequence3
   sequence4
Epilogue (optional, up to multiple lines)

which is Javaa regular expression can be used to retrieve all the sequences (i.e. sequence1, sequence2, sequence3, sequence4above)? For example, a loop Matcher.find()?

Each "sequence" is preceded and may also contain 0 or more white spaces (including tabs).

Next regex

(?m).*Main(?:[ |t]+:(?:[ |t]+(\S+)[\r\n])+

gives only the first sequence ( sequence1).

+4
source share
1 answer

You can use the following regex :

(?m)(?:\G(?!\A)[^\S\r\n]+|^Main\s*:\s*)(\S+)\r?\n?

More details

  • (?m) - multiline mode on
  • (?:\G(?!\A)[^\S\r\n]+|^Main\s*:\s*) - :
    • \G(?!\A)[^\S\r\n]+ - (\G(?!\A)), 1 + ([^\S\r\n]+, [\p{Zs}\t]+ [\s&&[^\r\n]]+)
    • | -
    • ^Main\s*:\s* - , Main, + +, :, 0+
  • (\S+) - 1, 1 +
  • \r?\n? - CR LF.

Java :

String p = "(?m)(?:\\G(?!\\A)[^\\S\r\n]+|^Main\\s*:\\s*)(\\S+)\r?\n?";
String s = "Preface (optional, up to multiple lines)...\nMain : sequence1\n   sequence2\n   sequence3\n   sequence4\nEpilogue (optional, up to multiple lines)";
Matcher m = Pattern.compile(p).matcher(s);
while(m.find()) {
    System.out.println(m.group(1));
}
+3

Source: https://habr.com/ru/post/1664355/


All Articles