Java regex String parsing trying to figure out a pattern

File file = new File("file-type-string-i-want-2000-01-01-01-01-01.conf.gz"); Matcher matcher = pattern.compile("\\-(.*)\\-\\d{4}")).matcher(fileName); StringBuilder sb = new StringBuilder(); while (matcher.find()) { sb.append(matcher.group()); } stringList = Arrays.asList(sb.toString().split("-")); if (stringList.size() >= 2) { nameFragment = stringList.get(stringList.size() - 2); } 

Desired Result - Extract

 string-iwant 

from lines that look like

 file-type-string-iwant-2000-01-01-01-01-01.conf.gz 

Unfortunately, the string-iwant format is the non-fixed length of alphanumeric characters that will include ONE hyphen only, but never begin with a hyphen. The date formatting is consistent, the year is always after the line, so my current approach is matching -year, but I can hardly exclude the material at the beginning.

Thanks for any thoughts or ideas.

Edit: updated rows

+4
source share
3 answers

Here you need a regex:

\\-([^-]+\\-[^-]+)\\-\\d{4}\\-

This basically means:

  • - starts with a minus
  • ([^-]+\\-[^-]+) contains 1 or more minus-minus characters, then minus, then 1 or more minus characters. This part is captured.
  • -\d{4} minus sign and 4 digits

However, this will only work if stuff-you-need has only one hyphen (or a constant number of hyphens, which will require correction in the regular expression). Otherwise, there is no way to find out if the string file-type-string-i-want matches the type word as desired or not.

Added:

If the file-type always contains exactly one hyphen, you can capture the required part as follows:

[^-]+\\-[^-]+\\-(.*)\\-\\d{4}\\-

Explanation:

  • [^-]+\-[^-]+\\- certain number of characters without a hyphen, then a hyphen, then no longer a hyphen. This will skip the file-type with the following hyphen.
  • \-\d{4}\- hyphen, 4 digits, and then another hyphen
  • (.*) everything that is between the previous two statements is fixed as the string you need to select
+4
source

If it were PHP, I would use something like the following to capture this line.

 /^(\w+\-){2}(?<string>.+?)\-\d{4}(\-\d{2}){5}(\.\w+){2}$/ 
0
source

The regular expression that I will use for this purpose is with a positive look:

 Pattern p = Pattern.compile("[^-]+-[^-]+(?=-\\d{4})"); 

It simply means matching a text with exactly one hyphen followed by one hyphen and a 4-digit year .

Then you can just grab matcher.group(0) as your consistent text, which will be string-iwant in this case.

0
source

Source: https://habr.com/ru/post/1403688/


All Articles