Use superCSV to read a large 80 GB text file

Question

Use superCSV to read a large 80 GB text file

I want to read a huge csv file. We use superCSV to analyze files in general. In this particular scenario, the file is huge, and for this reason there is always a memory shortage problem.

The initial idea is to read the file as fragments, but I'm not sure if this will work with superCSV, because when I lock the file, only the first fragment has header values and will be loaded into the CSV bean, while others the chunks have no header values, and I feel like this might throw an exception. So,

a) I was wondering if my thinking process is correct b) Are there other ways to solve this problem.

So my main question is:

Does superCSV have the ability to process large csv files, and I see that superCSV reads a document through BufferedReader. But I do not know what the size of the buffer is, and can we change it according to our requirement?

@Gilbert Le BlancI tried to split into smaller pieces according to your suggestion, but for a long time you have to split a huge file into smaller pieces. Here is the code I wrote for this.

import java.io.BufferedWriter; import java.io.File; import java.io.FileNotFoundException; import java.io.FileReader; import java.io.FileWriter; import java.io.IOException; import java.io.LineNumberReader; public class TestFileSplit { public static void main(String[] args) { LineNumberReader lnr = null; try { //RandomAccessFile input = new RandomAccessFile("", "r"); File file = new File("C:\\Blah\\largetextfile.txt"); lnr = new LineNumberReader(new FileReader(file), 1024); String line = ""; String header = null; int noOfLines = 100000; int i = 1; boolean chunkedFiles = new File("C:\\Blah\\chunks").mkdir(); if(chunkedFiles){ while((line = lnr.readLine()) != null) { if(lnr.getLineNumber() == 1) { header = line; continue; } else { // a new chunk file is created for every 100000 records if((lnr.getLineNumber()%noOfLines)==0){ i = i+1; } File chunkedFile = new File("C:\\Blah\\chunks\\" + file.getName().substring(0,file.getName().indexOf(".")) + "_" + i + ".txt"); // if the file does not exist create it and add the header as the first row if (!chunkedFile.exists()) { file.createNewFile(); FileWriter fw = new FileWriter(chunkedFile.getAbsoluteFile(), true); BufferedWriter bw = new BufferedWriter(fw); bw.write(header); bw.newLine(); bw.close(); fw.close(); } FileWriter fw = new FileWriter(chunkedFile.getAbsoluteFile(), true); BufferedWriter bw = new BufferedWriter(fw); bw.write(line); bw.newLine(); bw.close(); fw.close(); } } } lnr.close(); } catch (FileNotFoundException e) { e.printStackTrace(); } catch (IOException e) { e.printStackTrace(); } finally { } } }

+4

java supercsv java.nio.file

user1707141 Sep 28 '12 at 19:10

source share

2 answers

YSK Prasad · Answer 1 · 2013-06-26T17:58:05+0000

You can define the title in the js parser class itself. This way you do not need the title bar in the CSV files.

 // only map the first 3 columns - setting header elements to null means those columns are ignored final String[] header = new String[] { "customerNo", "firstName", "lastName", null, null, null, null, null, null, null }; beanReader.read(CustomerBean.class, header)

or

You can also use the SuperCSV api bulldozer extension.

Carlo V. Dango · Answer 2 · 2012-10-01T00:54:32+0000

I am not sure what the question is. Reading a line at a time when a bean takes up approximately constant memory consumption. If you save all reading objects at once, then yes, you have run out of memory. But how is this super csv error?

Use superCSV to read a large 80 GB text file

More articles: