Java Scanner class does not work in tokenization when 1024 characters is a divisor

I found the weird behavior of the java.util.Scanner class. I need to break the String variable into a set of tokens, separated by the symbol ";".

If I consider the string "a [* 1022]" + "; [* n]" I expect the number n of the token. However, if n = 3, the Scanner class fails: it “sees” only 2 tokens instead of 3. I think this is due to the internal size of the char buffer of the Scanner class.

a[x1022];      -> 1 token: correct

a[x1022];;     -> 2 token: correct

a[x1022];;;    -> 2 token: wrong  (I expect 3 tokens)

a[x1022];;;;   -> 4 token: correct

Here is a simple example:

import java.util.Scanner;

public static void main(String[] args) {

    // generate test string: (1022x "a") + (3x ";") 
    String testLine = "";
    for (int i = 0; i < 1022; i++) {
        testLine = testLine + "a";
    }
    testLine = testLine + ";;;";

    // set up the Scanner variable
    String delimeter = ";";
    Scanner lineScanner = new Scanner(testLine);
    lineScanner.useDelimiter(delimeter);
    int p = 0;

    // tokenization
    while (lineScanner.hasNext()){
            p++;
            String currentToken = lineScanner.next();
            System.out.println("token" + p +  ": '" + currentToken + "'");
    }
    lineScanner.close();
}

I would like to skip the “wrong” behavior, could you help me? Thanks

+4
source share
1 answer

- Oracle, , BufferedReader InputStream ( InputStreamReader). , BufferedReader , .

0

Source: https://habr.com/ru/post/1662656/


All Articles