What is the difference between [\ s \ S] ? and.? in Java regular expressions?

Question

What is the difference between [\ s \ S] ? and.? in Java regular expressions?

I designed a regex to identify the xml block inside a text file. The expression is as follows (I removed all the slashes to remove java so that they are easy to read):

<\?xml\s+version="[\d\.]+"\s*\?>\s*<\s*rdf:RDF[^>]*>[\s\S]*?<\s*\/\s*rdf:RDF\s*>

Then I optimized it and replaced [\s\S]*? on .*? . He suddenly stopped recognizing xml.

As far as I know, \s means that all space characters and \s mean all characters with non-white space or [^\s] , so [\s\S] should logically be equivalent . I have not used greedy filters, so what could be the difference?

+5

java xml regex

Dmitry Feb 07 '16 at 2:00

source share

2 answers

Here is a sheet explaining all regex commands.

Basically, \s\S will display all characters, including newlines. While . does not set default line terminators (for some flags it is necessary to set them).

+2

Spencer4134 Feb 07 '16 at 2:05

source share

Lonely neuron · Accepted Answer · 2016-02-07T02:10:47+0000

Regular expression expressions . and \s\S not equivalent since . by default does not use line terminators (for example, a new line).

According to oracle website . corresponds to

Any character (may or may not match string terminators)

while the line terminator is any of the following:

Newline character (string) ( '\n' ),
A carriage return character followed immediately by a newline character ( "\r\n" ),
Standalone carriage return character ( '\r' ),
The next character ( '\u0085' ),
Line Separator Character ( '\u2028' ) or
Paragraph separator character ( '\u2029 ).

Two expressions are not equivalent if the necessary flags are not set. Again quoting the oracle site:

If UNIX_LINES activated, then single line delimiters recognize newline characters.
Regular expression . matches any character except a string if the DOTALL flag is not specified.

What is the difference between [\ s \ S] *? and.*? in Java regular expressions?

More articles:

What is the difference between [\ s \ S] ? and.? in Java regular expressions?