Java - huge data search

I have a requisition where in one of the reports I need to collect about 10 million records from a database and transfer them to Excel.

An application is a client-server model where server-side logic is written in EJB and the client is written in Swing.

Now my question is when I try to populate Object of Java from Resultset, if the size of the result set is larger (> 100000) then it throws an error from memory on the Java side.

Can someone tell me how this script should be handled in Java? I need to transfer all records from the server to the client, and then I need to create an Excel report based on the data received from the server.

+4
source share
5 answers

I would break the result set into smaller pieces using the LIMIT command (mySQL, I don’t know if this is on other database servers). Something like this pseudo code:

long recsToget = 50000; long got = recsToGet; long offset = 0; while ( got == recsToGet ) { got = getNextBatchFromDb( offset ); writeBatchToCsv(); offset += recsToGet; //increase your OFFSET each time } 

And I would use LIMIT and OFFSET in the SQL query in the getNextBatchFromDb () function as follows:

 select * from yourtable LIMIT 50000 OFFSET 100000 

where OFFSET is the position to start reading, and LIMIT is the number to read.

Through this, you can read your large dataset in small chunks and update the CSV every time it is completed. You know that all records were read when getNextBatchFromDb () returns fewer rows than recsToGet.

+1
source

You can increase the available memory for the JVM by using the -Xmx switch (for example, -Xmx1024m sets the JVM to store up to 1 GB of memory).

If this is not an option, or you have already done it, the only alternative is to overwrite the server in order to return the results in stages, and not all at the same time. How you do this will depend on the particular implementation of the server.

0
source

You will need to work as much as possible with the data on the database side. Then, as soon as you have the data, try writing the data when you read it from the database, or at least in some kind of buffer, so as not to load all the data in the Java program.

0
source

Instead of using an object, you can use a primitive type. Note. If your client does not have more memory than your server, it makes no sense to send all this data to the client.

Typically, the server generates reports for the client. This maximizes the work performed by the server and reduces the data sent to the client. Excel cannot process more than one million rows per sheet, and its charts cannot process more than 32,000 points. I suggest you make a report on the server.

0
source

An object is not a good choice for this scenario. Listed below are some of the ways to handle this.

1) Apply pagination when retrieving records from the database and add to the report.

2) This parameter depends on the database server. Some of the database servers have the ability to export the output of any query to a flat file. Check if ur supports DB. Then, after exporting, you can read the contents from a flat file and generate a report.

Many friends have already mentioned the excel restriction, so you should have taken care of this as well.

0
source

Source: https://habr.com/ru/post/1343001/


All Articles