Could not open Excel file using Python

I am on a Debian GNU / Linux computer, working with Python 2.7.9.

As part of my work, I made python scripts that read inputs in various formats (for example, Excel, Csv, Txt) and analyze information in more standard files. This is not my first discovery or work with Excel files.

There is a specific file that gives me problems, I just can not open it. When I tried with xlrd (version 0.9.3), it gave me the following error:

xlrd.open_workbook('sample.xls') 

XLRDError: unsupported format or corrupted file: BOF not workbook / worksheet: op = 0x0009 vers = 0x0002 strm = 0x000a build = 0 year = 0 → BIFF21

I tried to investigate this question myself, found several answers in StackOverflow, but I still could not open it. This specific answer that I found may be a problem (second explanation), but it does not include a workaround: https://stackoverflow.com/a/316618/

A tool that can convert the file to csv / txt also solves the problem.

I already tried:

  • xlrd
  • openpyxl
  • xlsx2csv (shell tool)

A sample file is available here: https://ufile.io/r4m6j

As a side note, I can open it with LibreOffice Calc and MS Excel, so I could end up changing it to csv this way. The thing is, I need to do all this with a python script.

Thanks in advance!

+5
source share
9 answers

It seems that the problem is MS. The xls file is very strange, maybe you should contact xlrd support.

But I have a crazy workaround for you: xls2ods . This works for me, although xls2csv is not (SiC!).

So first install catdoc:

 $sudo apt-get install catdoc 

Then convert your xls file to ods and open ods with pyexcel_ods or whatever you want. To use pyexcel_ods, install it first using pip install pyexcel_ods .

 import subprocess from pyexcel_ods import get_data file_basename = 'sample' returncode = subprocess.call(['xls2ods', '{}.xls'.format(file_basename)]) if returnecode > 0: # consider to use subprocess.Popen if you need more control on stderr exit(returncode) data = get_data('{}.ods'.format(file_basename)) print(data) 

I get the following output:

 OrderedDict([(u'sample', [[u'labo', u'codfarm', u'farmacia', u'direccion', u'localidad', u'nom_medico', u'matricula', u'troquel', u'producto', u'cant_total']])]) 
+2
source

Here is a coolge that I would use:

Assuming you have LibreOffice in Debian, you can convert all your *.xls files to *.csv using:

 import os os.system("libreoffice --headless --convert-to csv *.xls") #or use os.call 

... and then work sequentially with csv .

Or you can convert only the damaged file if necessary using the try/except block:

 import os try: xlrd.open_workbook('sample.xls') except XLRDError: os.system("libreoffice --headless --convert-to csv sample.xls") # mycsv = open("sample.csv", "r") # for line in mycsv.readlines(): # ... # ... 

OBS: hold LibreOffice when running the script.

Alternatively, there are other conversion tools. Here is one (which I have not tested yet): https://github.com/dilshod/xlsx2csv

+1
source

If you are targeting windows, if you have Excel installed, and if you are familiar with Excel VBA, you will have a quick solution using the comtypes package:

http://pythonhosted.org/comtypes/

You will have direct access to Excel through its COM interfaces.

0
source

This code opens the xls file and saves it as a cvs file using the comtypes package:

 import comtypes.client as cl progId = "Excel.Application.15" xl = cl.CreateObject(progId) wb = xl.Workbooks.Open(r"C:\Users\aUser\Desktop\thermoList.xls") wb.SaveAs(r"C:\Users\aUser\Desktop\thermoList.csv",FileFormat=6) xl.DisplayAlerts = False xl.Quit() 

I could not verify it with "sample.xls", which is damaged. You can try another file. You may need to configure progId according to your version of Excel.

0
source

This is a problem with the file format. I am not sure which type of file it is, but not Excel. I just open and save a file called sample2.xls and compare the types: enter image description here

How do you create this file?

0
source

If you need to get the words as a list of strings:

 text_file = open("sample.xls", "r") lines = text_file.read().replace(chr(200), '').replace(chr(0), '').replace(chr(1), '').replace(chr(5), '').replace(chr(2), '').replace(chr(3), '').replace(chr(4), '').replace(chr(6), '').replace(chr(7), '').replace(chr(8), '').replace(chr(9), '').replace(chr(10), '').replace(chr(12), '').replace(chr(15), '').replace(chr(16), '').replace(chr(17), '').replace(chr(18), '').replace(chr(49), '').replace('Arial', '') for line in lines.split(chr(128)): print(line) 

output: enter image description here

0
source

The file you provided is corrupted, so there is no way for other respondents to check it and recommend a good solution. And the exception you posted confirming this. As a solution, you can try to debug some things, see a few steps below:

  • You mentioned that you tried the xlrd library. Try checking if your module is xlrd by doing this:

    Python 2.7.9

     >>> import xlrd >>> xlrd.__VERSION 

update to the latest official version if necessary

  • Try opening any other * .xls file and see if it works with your version of Python and the current library.

  • Check the module documentation pretty well, and there are several different things that describe how to use this module on different platforms (Win vs. Linux) http://xlrd.readthedocs.io/en/latest/dates.html

  • You can always figure out the community (there is a chance that you will end up in some strange state or error) link here https://github.com/python-excel/xlrd/issues

Hope this helps.

0
source

Unable to open Excel. As Yadiyada said, I think this is a data source problem. If you really want to find out the reason, I suggest you ask questions about excel instead of python.

0
source

It always works for me with any xls or xlsx files:

 def csv_from_excel(filename_xls, filename_csv): wb = xlrd.open_workbook(filename_xls, encoding_override='YOUR_ENCODING_HERE (fe "cp1251"') sh = wb.sheet_by_index(0) your_csv_file = open(filename_csv, 'wb') wr = unicodecsv.writer(your_csv_file) for rownum in xrange(sh.nrows): wr.writerow(sh.row_values(rownum)) your_csv_file.close() 

So, I do not work directly with the excel file before converting them to csv. Mb this will help you

-1
source

Source: https://habr.com/ru/post/1272777/


All Articles