Xlrd crash while reading .xls file modified by PHPExcel

It is very difficult for me to edit some Excel files using PHP and python.

I initially did everything in PHP using PHPExcel, but I was processing very large files and PHPExcel crashed when it ran out of memory. So I changed it to do some work with PHP and the rest with python.

So the process:

  • Parse xml sent to PHP script
  • Insert rows into Excel file (.xls) based on xml data
  • pass (.xls) file and xml data to python script to populate spreadsheet
  • ex. python upload.py Example.xls data.xml called by PHP
  • The python script uses xlrd, xlwt and xlutils to populate an Excel file.

The problem is that if the python script modifies the regular .xls file that I created manually, it works fine. But as soon as PHP excel modifies the Excel file, the python script raises the following error:

 _locate_stream(Workbook): seen 0 5 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 20 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 100= 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 120 4 4 4 4 4 4 4 4 4 4 4 4 4 3 2 2 File "upload.py", line 63, in <module> workbook_readonly = open_workbook(excel,formatting_info=True,on_demand=True) File "/home/student/eamorde/public_html/dining/xlrd/__init__.py", line 435, in open_workbook ragged_rows=ragged_rows, File "/home/student/eamorde/public_html/dining/xlrd/book.py", line 87, in open_workbook_xls ragged_rows=ragged_rows, File "/home/student/eamorde/public_html/dining/xlrd/book.py", line 619, in biff2_8_load cd.locate_named_stream(UNICODE_LITERAL(qname)) File "/home/student/eamorde/public_html/dining/xlrd/compdoc.py", line 390, in locate_named_stream d.tot_size, qname, d.DID+6) File "/home/student/eamorde/public_html/dining/xlrd/compdoc.py", line 418, in _locate_stream raise CompDocError("%s corruption: seen[%d] == %d" % (qname, s, self.seen[s])) xlrd.compdoc.CompDocError: Workbook corruption: seen[2] == 4 

So, I dug the xlrd source code and found a line that produces an error:

 def _locate_stream(self, mem, base, sat, sec_size, start_sid, expected_stream_size, qname, seen_id): # print >> self.logfile, "_locate_stream", base, sec_size, start_sid, expected_stream_size s = start_sid if s < 0: raise CompDocError("_locate_stream: start_sid (%d) is -ve" % start_sid) p = -99 # dummy previous SID start_pos = -9999 end_pos = -8888 slices = [] tot_found = 0 found_limit = (expected_stream_size + sec_size - 1) // sec_size while s >= 0: if self.seen[s]: print("_locate_stream(%s): seen" % qname, file=self.logfile); dump_list(self.seen, 20, self.logfile) raise CompDocError("%s corruption: seen[%d] == %d" % (qname, s, self.seen[s])) 

The last line is an upstream exception:

 raise CompDocError("%s corruption: seen[%d] == %d" % (qname, s, self.seen[s])) 

Can anyone explain this? The file is not corrupted since opening it in Excel works fine, but xlrd does not seem to be able to read it.

My PHP script does the following (rough sketch):

 $phpExcel = new PHPExcel(); $file = "MyFile.xls"; $reader = new PHPExcel_Reader_Excel5(); $phpExcel = $reader->load($file); //(... insert rows based on xml) $writer = new PHPExcel_Writer_Excel5(); $writer->save('MyFile.xls'); exec("python upload.py MyFile.xls data.xml"); 

If anyone knows why this might happen or even the best solution to my problems (memory problems with PHPExcel), we will be very grateful.

Edit: The source code of the file that causes the error can be found here .

Edit: I created an example, basically took my excel file and deleted any identifying information. Try it yourself, see here .

+4
source share
2 answers

I got the same error with one of my .xls files (excel might open them just fine, but xlrd doesn't work). As I believe, the Compdoc.seen array Compdoc.seen track of already read FAT sectors. In my case, the Root Entry Reader (SSCS) receives all sectors marked as visible, which leads to an increase in flight in the future. U may try to find an error in the logic of reading sectors and contribute to xlrd :) or just comment on these lines with an increased probability that, most likely, it will solve the problem in your case (like mine) and wait for xlrd to update.

+3
source

For anyone who contacts where you need data from a remote service, this is a hack, but the comment line is 419 ( raise CompDocError("%s corruption: seen[%d] == %d" % (qname, s, self.seen[s])) ) in compdoc.py seems to work fine.

0
source

Source: https://habr.com/ru/post/1481472/


All Articles