Regex in Java: match groups before the first character

My line looks like this:

"Chitkara DK, Rawat DJY, Talley N. The epidemiology of childhood recurrent abdominal pain in Western countries: a systematic review. Am J Gastroenterol. 2005;100(8):1868-75. DOI." 

I want to get letters in uppercase (only as separate words) to the first point to get: DK DJY N But not other characters after, like J DOI .

Here is my piece of code for the Java class. Pattern:

 \\b[AZ]{1,3}\\b 

Is there a general option in the regex to stop matching after a certain character?

+5
source share
2 answers

You can use contionous matching using \G and extract the desired matches from the first capture group:

 (?:\\G|^)[^.]+?\\b([AZ]{1,3})\\b 

To use this in a multi-line context, you need to use the MULTILINE flag. If your content is always one line, you can remove |^ from your template.

See https://regex101.com/r/JXIu21/3

Please note that regex101 uses the PCRE pattern, but all functions used are also available in Java regex.

+5
source

Sebastian Prosk's answer is wonderful, but it’s often easier (and more readable) to break down complex parsing tasks into separate steps. We can divide your goal into two separate steps and thereby create a much simpler and clearer solution using your original template.

 private static final Pattern UPPER_CASE_ABBV_PATTERN = Pattern.compile("\\b[AZ]{1,3}\\b"); public static List<String> getAbbreviationsInFirstSentence(String input) { // isolate the first sentence, since that all we care about String firstSentence = input.split("\\.")[0]; // then look for matches in the first sentence Matcher m = UPPER_CASE_ABBV_PATTERN.matcher(firstSentence); List<String> results = new ArrayList<>(); while (m.find()) { results.add(m.group()); } return results; } 
+2
source

Source: https://habr.com/ru/post/1264717/


All Articles