Convert HTML markup to RTF document

I have an XML document containing embedded HTML content that I am trying to convert to an RTF output file. I have XML elements decorated with <li>, <p>, <b> and other HTML markup that I would like to pass to the generated RTF.

Here is what works at the moment:

  • Get the contents of an XML tag as a string (containing HTML tags for line breaks, paragraph breaks, and list breaks)
  • Writing the contents of an XML tag to an RTF file.

I use Python scripts to achieve conversion. It also uses ElementTree (for parsing XML input) PyRTF-NG (for converting from HTML to RTF), a library that processes tables and other special formatting. At the moment, I managed to get everything that I need, except for the "markdown" of HTML (that is, converting HTML format tags to actual RTF formatting). To clarify, I mean that if my RTF converter encounters an <ol><li> , it should create an ordered list in RTF, and not just splash out the <ol><li> tags in RTF.

Does anyone know if Python has any native calls that will allow me to do this, or any other Python libraries that may have what I need for a complete conversion to RTF.

Thanks!

+3
source share
1 answer

The best free converter is LibreOffice, and it can be used directly on the command line in termimal, see

 libreoffice --convert-to 

The same converter is indirectly called by Python using the UNO bridge,

+3
source

Source: https://habr.com/ru/post/919928/


All Articles