I have an XML document containing embedded HTML content that I am trying to convert to an RTF output file. I have XML elements decorated with <li>, <p>, <b>
and other HTML markup that I would like to pass to the generated RTF.
Here is what works at the moment:
- Get the contents of an XML tag as a string (containing HTML tags for line breaks, paragraph breaks, and list breaks)
- Writing the contents of an XML tag to an RTF file.
I use Python scripts to achieve conversion. It also uses ElementTree (for parsing XML input) PyRTF-NG (for converting from HTML to RTF), a library that processes tables and other special formatting. At the moment, I managed to get everything that I need, except for the "markdown" of HTML (that is, converting HTML format tags to actual RTF formatting). To clarify, I mean that if my RTF converter encounters an <ol><li>
, it should create an ordered list in RTF, and not just splash out the <ol><li>
tags in RTF.
Does anyone know if Python has any native calls that will allow me to do this, or any other Python libraries that may have what I need for a complete conversion to RTF.
Thanks!
source share