Use superCSV to read a large 80 GB text file

I want to read a huge csv file. We use superCSV to analyze files in general. In this particular scenario, the file is huge, and for this reason there is always a memory shortage problem.

The initial idea is to read the file as fragments, but I'm not sure if this will work with superCSV, because when I lock the file, only the first fragment has header values ​​and will be loaded into the CSV bean, while others the chunks have no header values, and I feel like this might throw an exception. So,

a) I was wondering if my thinking process is correct b) Are there other ways to solve this problem.

So my main question is:

Does superCSV have the ability to process large csv files, and I see that superCSV reads a document through BufferedReader. But I do not know what the size of the buffer is, and can we change it according to our requirement?

@Gilbert Le BlancI tried to split into smaller pieces according to your suggestion, but for a long time you have to split a huge file into smaller pieces. Here is the code I wrote for this.

import java.io.BufferedWriter; import java.io.File; import java.io.FileNotFoundException; import java.io.FileReader; import java.io.FileWriter; import java.io.IOException; import java.io.LineNumberReader; public class TestFileSplit { public static void main(String[] args) { LineNumberReader lnr = null; try { //RandomAccessFile input = new RandomAccessFile("", "r"); File file = new File("C:\\Blah\\largetextfile.txt"); lnr = new LineNumberReader(new FileReader(file), 1024); String line = ""; String header = null; int noOfLines = 100000; int i = 1; boolean chunkedFiles = new File("C:\\Blah\\chunks").mkdir(); if(chunkedFiles){ while((line = lnr.readLine()) != null) { if(lnr.getLineNumber() == 1) { header = line; continue; } else { // a new chunk file is created for every 100000 records if((lnr.getLineNumber()%noOfLines)==0){ i = i+1; } File chunkedFile = new File("C:\\Blah\\chunks\\" + file.getName().substring(0,file.getName().indexOf(".")) + "_" + i + ".txt"); // if the file does not exist create it and add the header as the first row if (!chunkedFile.exists()) { file.createNewFile(); FileWriter fw = new FileWriter(chunkedFile.getAbsoluteFile(), true); BufferedWriter bw = new BufferedWriter(fw); bw.write(header); bw.newLine(); bw.close(); fw.close(); } FileWriter fw = new FileWriter(chunkedFile.getAbsoluteFile(), true); BufferedWriter bw = new BufferedWriter(fw); bw.write(line); bw.newLine(); bw.close(); fw.close(); } } } lnr.close(); } catch (FileNotFoundException e) { e.printStackTrace(); } catch (IOException e) { e.printStackTrace(); } finally { } } } 
+4
source share
2 answers

You can define the title in the js parser class itself. This way you do not need the title bar in the CSV files.

 // only map the first 3 columns - setting header elements to null means those columns are ignored final String[] header = new String[] { "customerNo", "firstName", "lastName", null, null, null, null, null, null, null }; beanReader.read(CustomerBean.class, header) 

or

You can also use the SuperCSV api bulldozer extension.

+2
source

I am not sure what the question is. Reading a line at a time when a bean takes up approximately constant memory consumption. If you save all reading objects at once, then yes, you have run out of memory. But how is this super csv error?

+1
source

Source: https://habr.com/ru/post/1436883/


All Articles