Strange behavior String.length ()

I have a class with main:

public class Main {

// args[0] - is path to file with first and last words
// args[1] - is path to file with dictionary 
public static void main(String[] args) {
    try {
        List<String> firstLastWords = FileParser.getWords(args[0]);
            System.out.println(firstLastWords);
        System.out.println(firstLastWords.get(0).length());

    } catch (IOException ex) {
        ex.printStackTrace();
    }
}
}

and I have FileParser:

public class FileParser {

    public FileParser() {
    }

    final static Charset ENCODING = StandardCharsets.UTF_8;


    public static List<String> getWords(String filePath) throws IOException {
        List<String> list = new ArrayList<String>();
        Path path = Paths.get(filePath);

        try (BufferedReader reader = Files.newBufferedReader(path, ENCODING)) {
            String line = null;
            while ((line = reader.readLine()) != null) {

                String line1 = line.replaceAll("\\s+","");
                if (!line1.equals("") && !line1.equals(" ") ){
                    list.add(line1);
                }
            }
            reader.close();
        }
        return list;
    }   
}

args[0]is the path to the txt file in just two words. Therefore, if the file contains:



returns commands:

[, ]
4

If the file contains:




returns commands:

[, , ]
2


even if the file contains:
  // go to the next line
  
tor   
kit

returns commands:

[, , ]
1

where digit is the length of the first line in the list.

So the question is, why does he count another symbol?

+4
source share
2 answers

Thanks to everyone.

This symbol as @Bill is a specification ( http://en.wikipedia.org/wiki/Byte_order_mark ) and is located at the beginning of the text file. So I found this character on this line:

System.out.println(((int)firstLastWords.get(0).charAt(0)));

65279

:
String line1 = line.replaceAll("\\s+","");

String line1 = line.replaceAll("\uFEFF","");
+2

Regex, \p{Graph} , . , OP.

, , , . , : replaceAll("(\\s|\\p{Cntrl})+",""). Regex, .

+1

Source: https://habr.com/ru/post/1585006/


All Articles