It is very difficult for me to edit some Excel files using PHP and python.
I initially did everything in PHP using PHPExcel, but I was processing very large files and PHPExcel crashed when it ran out of memory. So I changed it to do some work with PHP and the rest with python.
So the process:
- Parse xml sent to PHP script
- Insert rows into Excel file (.xls) based on xml data
- pass (.xls) file and xml data to python script to populate spreadsheet
- ex.
python upload.py Example.xls data.xml called by PHP - The python script uses xlrd, xlwt and xlutils to populate an Excel file.
The problem is that if the python script modifies the regular .xls file that I created manually, it works fine. But as soon as PHP excel modifies the Excel file, the python script raises the following error:
_locate_stream(Workbook): seen 0 5 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 20 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 100= 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 120 4 4 4 4 4 4 4 4 4 4 4 4 4 3 2 2 File "upload.py", line 63, in <module> workbook_readonly = open_workbook(excel,formatting_info=True,on_demand=True) File "/home/student/eamorde/public_html/dining/xlrd/__init__.py", line 435, in open_workbook ragged_rows=ragged_rows, File "/home/student/eamorde/public_html/dining/xlrd/book.py", line 87, in open_workbook_xls ragged_rows=ragged_rows, File "/home/student/eamorde/public_html/dining/xlrd/book.py", line 619, in biff2_8_load cd.locate_named_stream(UNICODE_LITERAL(qname)) File "/home/student/eamorde/public_html/dining/xlrd/compdoc.py", line 390, in locate_named_stream d.tot_size, qname, d.DID+6) File "/home/student/eamorde/public_html/dining/xlrd/compdoc.py", line 418, in _locate_stream raise CompDocError("%s corruption: seen[%d] == %d" % (qname, s, self.seen[s])) xlrd.compdoc.CompDocError: Workbook corruption: seen[2] == 4
So, I dug the xlrd source code and found a line that produces an error:
def _locate_stream(self, mem, base, sat, sec_size, start_sid, expected_stream_size, qname, seen_id): # print >> self.logfile, "_locate_stream", base, sec_size, start_sid, expected_stream_size s = start_sid if s < 0: raise CompDocError("_locate_stream: start_sid (%d) is -ve" % start_sid) p = -99 # dummy previous SID start_pos = -9999 end_pos = -8888 slices = [] tot_found = 0 found_limit = (expected_stream_size + sec_size - 1) // sec_size while s >= 0: if self.seen[s]: print("_locate_stream(%s): seen" % qname, file=self.logfile); dump_list(self.seen, 20, self.logfile) raise CompDocError("%s corruption: seen[%d] == %d" % (qname, s, self.seen[s]))
The last line is an upstream exception:
raise CompDocError("%s corruption: seen[%d] == %d" % (qname, s, self.seen[s]))
Can anyone explain this? The file is not corrupted since opening it in Excel works fine, but xlrd does not seem to be able to read it.
My PHP script does the following (rough sketch):
$phpExcel = new PHPExcel(); $file = "MyFile.xls"; $reader = new PHPExcel_Reader_Excel5(); $phpExcel = $reader->load($file); //(... insert rows based on xml) $writer = new PHPExcel_Writer_Excel5(); $writer->save('MyFile.xls'); exec("python upload.py MyFile.xls data.xml");
If anyone knows why this might happen or even the best solution to my problems (memory problems with PHPExcel), we will be very grateful.
Edit: The source code of the file that causes the error can be found here .
Edit: I created an example, basically took my excel file and deleted any identifying information. Try it yourself, see here .