Is java.util.Scanner slow?

In an Android application, I want to use the Scanner class to read a list of floats from a text file (this is a list of vertex coordinates for OpenGL). Exact code:

Scanner in = new Scanner(new BufferedInputStream(getAssets().open("vertexes.off"))); final float[] vertexes = new float[nrVertexes]; for(int i=0;i<nrVertexFloats;i++){ vertexes[i] = in.nextFloat(); } 

It seems like it's incredibly slow (it took 30 minutes to read 10,000 floats!) - as tested on emulator 2.1. What's happening? I donโ€™t remember Scanner being so slow when I used it on a PC (though Iโ€™ve never read more than 100 values โ€‹โ€‹before). Or is it something else, such as reading from an asset input stream?

Thanks for the help!

+16
java android
Mar 15
source share
7 answers

I do not know about Android, but at least on JavaSE, the scanner runs slowly.

Inside, the scanner does the UTF-8 conversion, which is useless in a file with floats.

Since all you want to do is read the floats from the file, you have to go with the java.io package.

The guys at SPOJ are struggling with I / O. This is the site of the Polish competition for programmers with very complex problems. Their difference is that they accept a wider range of programming languages โ€‹โ€‹than other sites, and in many of their problems the input is so large that if you do not write efficient I / O, your program will burst.

Check your forums, for example here , for the idea of โ€‹โ€‹a custom parser.

Of course, I advise you not to write your own float parser, but if you need speed, this is still a solution.

+8
Mar 15 '10 at 12:03
source share

Like other posters, it is more efficient to include data in binary format. However, for a quick fix, I found that the replacement:

 scanner.nextFloat(); 

from

 Float.parseFloat(scanner.next()); 

almost 7 times faster.

To add additional information to this answer, the source of performance issues with the method is that it uses a regular expression to find the next float, which is not necessary if you know the structure of the data that you read in advance.

In most cases (if not all), next* uses regular expressions for the same reason, so if you know the structure of your data, it is preferable to always use next() and analyze the result. I.E. also use Double.parseDouble(scanner.next()) and Integer.parseInt(scanner.next()) .

Corresponding source: https://android.googlesource.com/platform/libcore/+/master/luni/src/main/java/java/util/Scanner.java

+21
Dec 12 2018-10-12
source share

For the Spotify Challenge, they wrote a small java utility for parsing IO faster: http://spc10.contest.scrool.se/doc/javaio The utility is called Kattio.java and uses BufferedReader, StringTokenizer and Integer.parseInt / Double.parseDouble / Long. parseLong for reading numbers.

+2
Oct. 15 2018-11-22T00:
source share

Very insightful post. Usually, when I was working with Java, Scanner thought was the fastest on a PC. Same thing when I try to use it in AsyncTask on Android, its WORST .

I think Android should come up with an alternative to the scanner. I used scanner.nextFloat(); and scanner.nextDouble(); and scanner.nextInt(); all together that made my life sick. After I checked my application, it turned out that the culprit was sitting secretly.

I changed to Float.parseFloat(scanner.next()); similar to Double.parseDouble(scanner.next()); and Integer.parseInt(scanner.next()); which, of course, made my application pretty fast, I have to agree, 60% faster.

If someone has experienced the same, write here. And I look too much at an alternative to the API Scanner , anyone has bright ideas that can appear and be placed here when reading file formats.

+1
Jun 19 '14 at 11:00 a.m.
source share

Yes, I do not see anything like it. I can read about 10 M floating this way after 4 seconds on the desktop, but it just can't be like that.

I'm trying to think of other explanations - is it possible that it blocks the reading of the input stream from getAssets ()? I could try to fully read this resource by selecting the time and then seeing how long it takes to scan.

0
Mar 15 '10 at 12:59
source share

Scanner may be part of the problem, but you need to profile your code to know. Alternatives may be faster. Here is a simple test comparing Scanner and StreamTokenizer .

0
Mar 15 '10 at 14:53
source share

I have exactly the same problem. It took 10 minutes to read my 18KB file. In the end, I wrote a desktop application that converts these human-readable numbers into a machine-readable format using DataOutputStream.

The result was astounding.

Btw, when I tracked it, most calls to the Scanner method include regular expressions, the implementation of which is provided by the com.ibm.icu.** packages (IBM ICU project). This is really redundant.

The same goes for String.format . Avoid it on Android!

0
Apr 15 '10 at
source share



All Articles