Writing large text files in excel

I am reading a text file separated by some delimiters.

Sample contents of my text file

Avc def efg jksjd
1 2 3 5
3 4 6 0

line by line and holding it in memory using hashmap, which has line numbers as an integer type key and each line of a text file as a list object

Think my card will save such information

Integer list

1 [Avc def efg jksjd]

I am using Apache POI to write to excel. When writing to excel using Apache POI I follow this approach, here is my code snippet

HSSFWorkbook workbook = new HSSFWorkbook(); HSSFSheet sheet = workbook.createSheet("Sample sheet"); Map<Integer, List<Object>> excelDataHolder = new LinkedHashMap<Integer, List<Object>>(); int rownum = 0; for (Integer key : keyset) { Row row = sheet.createRow(rownum++); List<Object> objList = excelHolder.get(key);//excelHolder is my map int cellnum = 0; for (Object obj : objList) { Cell cell = row.createCell(cellnum++); cell.setCellValue((Date) obj); } } 

This works very well if the number of lines / records to be written to excel is less. Imagine if there are billions of records or if the text file has more lines, suppose 100,000. I think my approach does not work, because createRow and createCell creates more than 100,000 objects on the heap. Regardless of what java is for excel api, I think that writing to it (excel) is based on the same approach ie., Iterating the collection as shown above. I also did some examples with aspose, as a result, aspose also has the same problem as me.

  • Does createRow and createCell create new objects each time they are called?
  • If so, what is the alternative ?. How to write big data to improve performance?
+4
source share
4 answers

In a recent version of apache-poi sxssf . Shameless copy from website

SXSSF (package: org.apache.poi.xssf.streaming) is an API compatible XSSF streaming extension that will be used when very large spreadsheets need to be produced and heap space is limited. SXSSF achieves low memory levels by restricting access to lines inside while XSSF provides access to all lines in a document. Old lines that are no longer in the window become inaccessible as they are written to disk.

I used it to create a table with 1.5 million rows.

+3
source

I will answer regarding Aspose.Cells for Java since you tried this too.

Creating or loading a very large Excel file almost always requires a lot of memory. Even if you read one line or several lines at a time, you will still write the contents to an instance of the book, which is loaded into memory.

Solution 1 (insufficient and very limited): Increase the heap size, if the maximum heap size is allowed for your largest file, select it.

Solution 2 (complex with some manual work): Excel 2007 and later versions allow you to create about 1 million rows per sheet. I would suggest you create one book with one sheet for 1 million lines. That is, if you have 10 million lines in a text file, create 10 separate Excel workbooks.

Later, merge them into one Excel workbook manually. Aspose.Cells will throw an exception when copying sheets with such huge data.

Below is a snippet of code that creates 10 separate Excel files, each of which has 1 million lines.

 import com.aspose.cells.*; import java.util.*; public class ExcelLargeTextImport { private static String excelFile = Common.dataDir + "largedata.xlsx"; public static void main(String args[]) { try { Common.setLicenses(); importToExcel(); } catch(Exception ex) { System.out.println(ex.getMessage()); } } private static void importToExcel() throws Exception { // Process each workbook in a method for (int sheetCounter=0 ; sheetCounter<10 ; sheetCounter++) { saveWorkbook(sheetCounter); } } private static void saveWorkbook(int sheetCounter) throws Exception { Workbook workbook = new Workbook(); // Get the first sheet Worksheet worksheet = workbook.getWorksheets().get(0); Cells cells = worksheet.getCells(); // Initialize array list with 1 million records ArrayList<String> lines = new ArrayList<String>(); int rowCount = 1000000; for (int i=0 ; i<rowCount ; i++) { lines.add(i + ";value1;value2;value3"); } long lineNo = 1; for (String line : lines) { // Split the line by delimeter String[] values = line.split(";"); // First cell Cell cell = cells.get("A" + lineNo); cell.setValue(values[0]); // Second cell cell = cells.get("B" + lineNo); cell.setValue(values[1]); // Third cell cell = cells.get("C" + lineNo); cell.setValue(values[2]); // Fourth cell cell = cells.get("D" + lineNo); cell.setValue(values[2]); lineNo++; } System.out.print(sheetCounter + " "); // Saving the Excel file workbook.save(excelFile.replace(".xlsx", sheetCounter + ".xlsx")); System.out.println("\nExcel file created"); } } 

PS. I am an evangelist developer at Aspose.

+2
source

Why don't you read and write pieces. Here is an approach I can think of:

  • Read your txt file for a few lines and put the information on the card as you do. Suppose you read 100 lines and you have 100 entries on the map.
  • Write these hundreds of entries to transfer the file, play excel for the first time
  • Clear card or reinitialize.
  • Now read the next 100 lines of text. Since I understand that there is no possibility of direct access to the 101st line without reading the first 100 lines. Therefore, you may need to read the file from the very beginning, but you can avoid the first 100 lines and create a record on the map.
  • Now update the excel file. I think you can update excel using POI as indicated in this link: Edit existing excel files using jxl api / Apache POI

If you keep repeating this process. You will certainly save memory consumption, although I do not see a significant difference in processor consumption.

Hope this helps!

+1
source

Here is your answer ...

Try this simple code, and if you need a future, you can ...

fooobar.com/questions/1480658 / ...

0
source

Source: https://habr.com/ru/post/1480656/


All Articles