Perl reading a huge excel file

I have a huge xlsx file (about 127 MB) and you want to read using the Spreadsheet::Excel module, but I get " Not enough memory" on a computer with 2 GB of RAM . (Note that the script works fine with smaller Excel 2007 files)

Is there a way to read an excel file line by line without clicking on the memory limit.? google search I came across http://discuss.joelonsoftware.com/default.asp?joel.3.160328.14 , but I do not know how to store the table in a scalar file. Can anyone present an example of reading Excel 2007 files as scalar and print cell values. Below is the current script that I run on small tables.

 #!/usr/bin/perl use Excel::Writer::XLSX; use Spreadsheet::XLSX; my $workbook = Excel::Writer::XLSX->new('Book1.xlsx'); my $worksheet = $workbook->add_worksheet(); # use strict; my $excel = Spreadsheet::XLSX -> new ('Book2.xlsx'); my $date_format = $workbook->add_format(); $date_format->set_num_format('dd/mm/yy hh:mm'); # Columns of interest @columns=(0,1,2,5,9,10,12,13,31); @reportlist=("string1","String2","String3"); @actuallist=("ModifiedString1","ModifiedString2","ModifiedString3"); $max_list=$#reportlist; foreach my $sheet (@{$excel -> {Worksheet}}) { printf("Sheet: %s\n", $sheet->{Name}); $sheet -> {MaxRow} ||= $sheet -> {MinRow}; foreach my $row ($sheet -> {MinRow} .. $sheet -> {MaxRow}) { $sheet -> {MaxCol} ||= $sheet -> {MinCol}; for ($c=0;$c<=$#columns;$c++){ $col=$columns[$c]; my $cell = $sheet -> {Cells} [$row] [$col]; if($col==0){ $cell->{Val}=~ s/\ GMT\+11\:00//g; $worksheet->write($row,$c,$cell->{Val},$date_format); } if ($cell) { $worksheet->write($row,$c,$cell -> {Val}); for($z=0;$z<=$#reportisplist;$z++){ if(($cell->{Val})=~ m/$reportlist[$z]/i){ $worksheet->write($row,$c,$actuallist[$z]); } } } } } } $workbook->close(); 
+4
source share
4 answers

I am working on a new module for quickly and efficiently reading Excel xlsx files with Perl. This is not yet on CPAN (this requires a bit more work), but you can get it on GitHub .

Here is an example of how to use it:

 use strict; use warnings; use Excel::Reader::XLSX; my $reader = Excel::Reader::XLSX->new(); my $workbook = $reader->read_file( 'Book1.xlsx' ); if ( !defined $workbook ) { die $reader->error(), "\n"; } for my $worksheet ( $workbook->worksheets() ) { my $sheetname = $worksheet->name(); print "Sheet = $sheetname\n"; while ( my $row = $worksheet->next_row() ) { while ( my $cell = $row->next_cell() ) { my $row = $cell->row(); my $col = $cell->col(); my $value = $cell->value(); print " Cell ($row, $col) = $value\n"; } } } __END__ 

Update . This module has never been in CPAN quality. Instead, try Spreadsheet :: ParseXLSX .

+5
source

Have you tried converting XLSX to csv and reading it as a regular text file?

+4
source

Try it. Assuming you installed Spreadsheet :: Read perl module, which can determine the actual parser module that will be used to read the file, below the code snippets reads and prints the cell of the 1st sheet of the input book. You can examine the $ workbook object to see all the options available for customization. This module can be used to read files in other formats, such as "csv", "xls". Here is a link to a tutorial that I thought was useful: http://search.cpan.org/~hmbrand/Spreadsheet-Read/Read.pm

ReadData can be configured by passing parameters. It has many options from two options, which are โ€œcellsโ€ and โ€œrc,โ€ which can be used to change the behavior associated with reading a file. By default, both parameters are true. If the "cells" are true, then ReadData stores the book's cells in a hash in the returned object. If "rc" is true, then ReadData stores the workbook cells in an array in the returned object. In the code snippet below, by setting the cells => 0, the contents of the sheet will not be saved in a hash format in the returned workbook object, thereby saving some space in memory. By default, this option is true, that is, 1, and so on. In addition, to prevent the full file from being read, you can also set the "rc" option to false.

 use Spreadsheet::Read; ############################################################################ # function input : file in xlsx format with absolute path # function output : prints 1st worksheet content if exist ############################################################################ sub print_xlsx_file{ my $file_path = shift; my $workbook = ReadData($file_path,cells => 0 ); if(defined $workbook->[0]{'error'}){ print "Error occurred while processing $file_path:". $workbook->[0]{'error'}."\n"; exit(-1); } my $worksheet = $workbook->[1]; my $max_rows = $worksheet->{'maxrow'}; my $max_cols = $worksheet->{'maxcol'}; for my $row_num (1..($max_rows)) { for my $col_num (1..($max_cols)){ print $worksheet->{'cell'}[$col_num][$row_num]."\n"; } } } # call above function # print_xlsx_file("/home/chammu/mybook.xlsx"); 
0
source

The csv solution is a good one. But you should also consider saving as xlsb - it often provides a reduction in file size, allowing some features of excel. (I would post it as a comment, but not yet a reputation ...).

0
source

Source: https://habr.com/ru/post/1341930/


All Articles