Why is parsing a string in Date in Java slow? Can we speed it up?

I am reading a text file containing dates, and I want to parse strings representing dates in Date objects in java. I noticed that the operation is slow. What for? is there any way to speed it up? My file looks like this:

2012-05-02 12:08:06:950, secondColumn, thirdColumn 2012-05-02 12:08:07:530, secondColumn, thirdColumn 2012-05-02 12:08:08:610, secondColumn, thirdColumn 

I read the file line by line, then I get the String date from each line, then I parse it into a Date object using SimpleDateFormat as follows:

 DataInputStream in = new DataInputStream(myFileInputStream); BufferedReader br = new BufferedReader(new InputStreamReader(in)); String strLine; SimpleDateFormat formatter = new SimpleDateFormat("yyyy-MM-dd HH:mm:ss"); while ((strLine = br.readLine()) != null) { ....Do things.... Date myDateTime = (Date)formatter.parse(myDateString); ...Do things.... } 
+6
source share
3 answers

Converting dates and time zones is expensive. If you can assume that your date / time is alike, you can convert the date and hours / minutes (or only dates if you use GMT) when the minutes change and generate seconds on their own.

This will call parse once a minute. Depending on your assumptions, you can do this once an hour or once a day.

 String pattern = "yyyy-MM-dd HH:mm"; SimpleDateFormat formatter = new SimpleDateFormat(pattern); String lastTime = ""; long lastDate = 0; while ((strLine = br.readLine()) != null) { String myDateString = strLine.split(", ")[0]; if (!myDateString.startsWith(lastTime)) { lastTime = myDateString.substring(0, pattern.length()); lastDate = formatter.parse(lastTime).getTime(); } Date date = new Date(lastDate + Integer.parseInt(myDateString.substring(pattern.length() + 1).replace(":", ""))); } 
+6
source

I would suggest writing your own parser, which will be faster. Sort of:

 Date parseYYYYMMDDHHMM(String strDate) { String yearString = strDate.substring(0, 4); int year = Integer.parseInt(yearString); ... 

Another way is to use a pre-computed hash of a date and time file (without a millisecond) for unix-timestamp. It will work if there are not many different dates (or you can recalculate it as soon as the date turns over).

+1
source

TL; DR

  • Use java.time , not inherited classes.
  • Each analysis from String to LocalDateTime with a DateTimeFormatter takes less than 1,500 nanoseconds each (0.0000015 seconds).

java.time

You are using the nasty old time classes that are now obsolete, being superseded by java.time classes.

Do some micro benchmarking to see how slow / fast parses the date string in java.time.

ISO 8601

The ISO 8601 standard defines reasonable practical formats for the textual representation of date and time values. The java.time classes use these standard default formats when parsing / generating strings.

Use these standard formats instead of inventing your own, as shown in the Question.

DateTimeFormatter

Define a formatting template that matches your inputs.

 DateTimeFormatter f = DateTimeFormatter.ofPattern( "uuuu-MM-dd HH:mm:ss:SSS" ); 

We will analyze every input like LocalDateTime , because there is no time zone indicator or offset-from-UTC at your input. Keep in mind that such values ​​do not represent a moment; they are not a point on the timeline. The actual moment requires a zone / offset context.

 String inputInitial = "2012-05-02 12:08:06:950" ; LocalDateTime ldtInitial = LocalDateTime.parse( inputInitial , f ); 

Let me make a bunch of such inputs.

 int count = 1_000_000; List < String > inputs = new ArrayList <>( count ); for ( int i = 0 ; i < count ; i++ ) { String s = ldtInitial.plusSeconds( i ).format( f ); inputs.add( s ); } 

Wiring harness.

 long start = System.nanoTime(); for ( String input : inputs ) { LocalDateTime ldt = LocalDateTime.parse( input , f ); } long stop = System.nanoTime(); long elapsed = ( stop - start ); long nanosPerParse = (elapsed / count ) ; Duration d = Duration.ofNanos( elapsed ); 

Dump for the console.

 System.out.println( "Parsing " + count + " strings to LocalDateTime took: " + d + ". About " + nanosPerParse + " nanos each."); 

Parsing 1,000,000 lines in LocalDateTime took: PT1.320778647S. About 1320 sediment each.

Too slow?

Thus, on a MacBook Pro with a quad-core Intel i7 processor, it takes about a second and a half to parse a million of these inputs. In my test runs, each parsing takes from 1000 to 1500 nanoseconds each.

In my opinion, this is not a performance issue.


About java.time

The java.time framework is built into Java 8 and later. These classes supersede the nasty old legacy time classes such as java.util.Date , Calendar and SimpleDateFormat .

The Joda-Time project, now in maintenance mode , is advised to switch to the java.time classes.

To learn more, see the Oracle Tutorial . And search for qaru for many examples and explanations. JSR 310 specification .

You can exchange java.time objects directly with your database. Use a JDBC driver that conforms to JDBC 4.2 or later. No strings needed, no java.sql.* Classes needed.

Where to get java.time classes?

The ThreeTen-Extra project extends java.time with additional classes. This project is proof of possible future additions to java.time. Here you can find useful classes such as Interval , YearWeek , YearQuarter and more .

0
source

Source: https://habr.com/ru/post/922055/


All Articles