Split / tokenization / scan of a string with quotes

Is there a default / easy way in Java for split strings, but care about quotes or other characters?

For example, given this text:

There "a man" that live next door 'in my neighborhood', "and he gets me down..."

Receive:

There's
a man
that
live
next
door
in my neighborhood
and he gets me down
+3
source share
2 answers

Something like this works for your input:

    String text = "There \"a man\" that live next door "
        + "'in my neighborhood', \"and he gets me down...\"";

    Scanner sc = new Scanner(text);
    Pattern pattern = Pattern.compile(
        "\"[^\"]*\"" +
        "|'[^']*'" +
        "|[A-Za-z']+"
    );
    String token;
    while ((token = sc.findInLine(pattern)) != null) {
        System.out.println("[" + token + "]");
    }

The above prints ( as seen on ideone.com ):

[There's]
["a man"]
[that]
[live]
[next]
[door]
['in my neighborhood']
["and he gets me down..."]

It uses Scanner.findInLinewhere the regex pattern is one of:

"[^"]*"      # double quoted token
'[^']*'      # single quoted token
[A-Za-z']+   # everything else

No doubt it does not work 100%; cases where quotes may be nested, etc., will be difficult.

References

+5
source

, , , .. There's in my neighborhood

- , , . . - .

+1

Source: https://habr.com/ru/post/1752694/


All Articles