Java Scanner.nextLine () consumes newline

I have a scanner installed that works with InputStream.

I use Scanner.nextLine () to jump to each line, and then execute some regular expressions on each line.

I have a regular expression that basically looks like [\w\p{Z}]+?[;\n\r] to pick something up to the end of this line or just ONE thing if they are separated by a comma.

so if my InpustStream looks like

 abcd; xyz 

He will select abcd ;, but not xyz.

I think this is due to the fact that the scanner consumes a newline at the end of a line of text, it must be consumed in some way when the .nextLine () function is called. Can someone tell me how to solve this problem?

As additional information for my regex, I am compiling a pattern using Pattern.DOTALL

Thanks!

+4
source share
5 answers

Actually, this is the one that causes the problem, trying to use a new line at the end of the last line .: - / This is absolutely true for the last line to end abruptly without a newline, but your regular expression requires it to be one. You may be able to fix this by replacing the new line with an anchor or look, but there are much simpler ways to do this.

One of them is to override the default separator and iterate over the fields with next() :

 Scanner sc1 = new Scanner("abcd;\nxyz"); sc1.useDelimiter("[;\r\n]+"); while (sc1.hasNext()) { System.out.printf("%s%n", sc1.next()); } 

Another is to nextLine() through the lines using nextLine() (using the default delimiter), and then split each line into a semicolon:

 Scanner sc2 = new Scanner("abcd;\nxyz"); while (sc2.hasNextLine()) for (String item : sc2.nextLine().split(";")) { System.out.printf("%s%n", item); } 

The scanner API is one of the most hyped and unintuitive I've ever worked with, but you can significantly reduce the pain of using it if you remember these two important points:

  • Think in terms of matching separators rather than fields (e.g. for String split() ).
  • Never call one of the nextXXX() methods without first calling the corresponding hasNextXXX() method.
+5
source

So why not add a new line to your nextLine() result?

Is there a special regex character ^ or $ that denotes line boundaries?

+2
source

The regex $ sign finds "end of pattern". Having said that, since you do not have the end of a line symbol, it is easy to consume everything up to the first half of the colon; just use everything except semicolon:

 [^;]+ 

Scanner consumes a newline as part of its behavior, because you usually do not want to deal with it, and it depends on the system.

Edit: someone pointed out in a comment that you can just use line.split(";") and capture the first value. That will work too.

+1
source

The API clearly indicates that the next line removes any line separator nextLine ()

You can make one of the various offers in the other answers. But also note that the scanner has methods with a β€œpattern”. therefore, if your regular expression is correct, you can use the following methods:

hasNext (pattern template) or hasNext (String pattern) to find out if you have more tokens

and then

next (pattern template) or next (String pattern) to get the token if the above is true.

+1
source

You can use \z in your regex pattern to indicate the end of the input, or $ for the end of the line. In addition, Scanner.nextLine() returns a string without a newline by default. In addition, you can change the delimiters used by your Scanner to enable ; using the useDelimiter method. Finally, your pattern may not do what you think it does, because \p{Z} only catches the letters "Z", judging by the documentation for Pattern .

0
source

Source: https://habr.com/ru/post/1395788/


All Articles