Problem
\d+\|\d+\|\d+\|\d+ part of your regular expression seems to be working fine, which suggests that the problem should be related to the part .* .
Checks which characters cannot match by default . that may prevent matches returning true .
(I will test only characters in the range 0 - FFFF , but Unicode has more characters, such as surrogate pairs, so I am not saying that these are only characters that cannot match - even if it is today we cannot be sure of the future )
for (int ch = 0; ch < '\uFFFF'; ch++) { if (!Character.toString((char)ch).matches(".*")) { System.out.format("%-4d hex: \\u%04x %n", ch, ch); } }
We will get as a result (added some comments and links)
10 hex: \u000a - string (\ n)
13 hex: \u000d - carriage return (\ r)
133 hex: \u0085 - next line (NEL)
8232 hex: \u2028 - line separator
8233 hex: \u2029 - paragraph separator
Therefore, I suspect that your string contains one of these characters. Now, not all tools properly recognize these characters as regular line breaks (which the regular expression recognizes). For example, let's test BufferedReader
String data = "AAA\nBBB\rCCC\u0085DDD\u2028EEE\u2029FFF"; BufferedReader br = new BufferedReader(new StringReader(data)); String line = null; while((line = br.readLine())!=null){ System.out.println(line); }
we get the result:
AAA
BBB
CCCDDD EEE FFF
β¬ here we have `\ u0085` (NEL)
As you can see, tools that are not based on the regex engine can return a string that will be a single string, but will still contain characters that the regular expression sees as line separators.
Possible solutions
We can try to let . combine any characters. To do this, we can use the Pattern.DOTALL flag (we can enable it by adding (?s) to regex, for example (?s).* ).
In addition, as you already mentioned your question , we can set the regex engine in Pattern.UNIX_LINES ( (?d) flag) mode, which will make it see only \n as a line separator (other characters like \r will not be considered as line separators )