Create large Excel sheets programmatically

We use OpenPyxl to export MySQL content to Microsoft Excel in XSLX format.

https://bitbucket.org/ericgazoni/openpyxl/overview

However, the amount of data we are dealing with is large. We are ending the memory situation. Tables can contain up to 400 columns in 50,000 + rows. Even the files are large, they are not so large that Microsoft Excel or OpenOffice should have problems with them. We proceed from our problems, mainly because Python stores the XML DOM structure in memory in an insufficiently efficient manner.

EDIT: Eric, author of OpenPyxl, pointed out that it is possible to record OpenPyxl using fixed memory. However, this did not completely solve our problem, since we still have problems with the original speed and something else that takes up too much memory in Python.

Now we are looking for more efficient ways to create Excel files. Python is desirable, but if we cannot find a good solution, we could also look at other programming languages.

Options, and not in any particular order, include

1) Using OpenOffice and PyUno and we hope that their memory structures will be more efficient than with OpenPyxl, and the TCP / IP call modem is quite efficient

2) Openpyxl uses xml.etree. Will Python lxml (native libxml2 extension) be more efficient with XML memory structures, and is it possible to replace xml.etree directly with drop-in lxml, for example? with a monkey patch? (later changes can be reverted to Openpyxl if there is a clear advantage)

3) Export from MySQL to CSV, and then post-process CSV files directly to XSLX using Python and file iteration

4) Use other programming languages โ€‹โ€‹and libraries (Java)

Pointers

http://dev.lethain.com/handling-very-large-csv-and-xml-files-in-python/

http://enginoz.wordpress.com/2010/03/31/writing-xlsx-with-java/

+6
source share
2 answers

If you intend to use Java, you will need to use the Apache POI, but most likely not the usual UserModel, since you want to save a snapshot of your memory.

Instead, take a look at BigGridDemo , which shows how to write a very large xlsx file using a POI, with most of the work not happening in memory.

You may also find that the technique used in BigGridDemo can be used equally in Python?

+3
source

Have you tried to take a look at the optimized author for openpyxl? This is a recent feature (2 months), but it is quite reliable (used in production in several corporate projects) and can process an almost unlimited amount of data with stable memory consumption (about 7 MB).

http://packages.python.org/openpyxl/optimized.html#optimized-writer

+3
source

Source: https://habr.com/ru/post/886127/


All Articles