I want to read a huge csv file. We use superCSV to analyze files in general. In this particular scenario, the file is huge, and for this reason there is always a memory shortage problem.
The initial idea is to read the file as fragments, but I'm not sure if this will work with superCSV, because when I lock the file, only the first fragment has header values ββand will be loaded into the CSV bean, while others the chunks have no header values, and I feel like this might throw an exception. So,
a) I was wondering if my thinking process is correct b) Are there other ways to solve this problem.
So my main question is:
Does superCSV have the ability to process large csv files, and I see that superCSV reads a document through BufferedReader. But I do not know what the size of the buffer is, and can we change it according to our requirement?
@Gilbert Le BlancI tried to split into smaller pieces according to your suggestion, but for a long time you have to split a huge file into smaller pieces. Here is the code I wrote for this.
import java.io.BufferedWriter; import java.io.File; import java.io.FileNotFoundException; import java.io.FileReader; import java.io.FileWriter; import java.io.IOException; import java.io.LineNumberReader; public class TestFileSplit { public static void main(String[] args) { LineNumberReader lnr = null; try { //RandomAccessFile input = new RandomAccessFile("", "r"); File file = new File("C:\\Blah\\largetextfile.txt"); lnr = new LineNumberReader(new FileReader(file), 1024); String line = ""; String header = null; int noOfLines = 100000; int i = 1; boolean chunkedFiles = new File("C:\\Blah\\chunks").mkdir(); if(chunkedFiles){ while((line = lnr.readLine()) != null) { if(lnr.getLineNumber() == 1) { header = line; continue; } else { // a new chunk file is created for every 100000 records if((lnr.getLineNumber()%noOfLines)==0){ i = i+1; } File chunkedFile = new File("C:\\Blah\\chunks\\" + file.getName().substring(0,file.getName().indexOf(".")) + "_" + i + ".txt"); // if the file does not exist create it and add the header as the first row if (!chunkedFile.exists()) { file.createNewFile(); FileWriter fw = new FileWriter(chunkedFile.getAbsoluteFile(), true); BufferedWriter bw = new BufferedWriter(fw); bw.write(header); bw.newLine(); bw.close(); fw.close(); } FileWriter fw = new FileWriter(chunkedFile.getAbsoluteFile(), true); BufferedWriter bw = new BufferedWriter(fw); bw.write(line); bw.newLine(); bw.close(); fw.close(); } } } lnr.close(); } catch (FileNotFoundException e) { e.printStackTrace(); } catch (IOException e) { e.printStackTrace(); } finally { } } }
source share