Python html2text adds random \ n

When using the python html2text package to convert html to markdown, it adds '\ n' to the text. I also see this behavior when trying to demonstrate at http://www.aaronsw.com/2002/html2text/

Is there any way to turn this? Of course, I can delete them myself, but "\ n" may appear in the source text, which I do not want to delete.

html2text('Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.') u'Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod\ntempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam,\nquis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo\nconsequat. Duis aute irure dolor in reprehenderit in voluptate velit esse\ncillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non\nproident, sunt in culpa qui officia deserunt mollit anim id est laborum.\n\n' 
+4
source share
2 answers

Looking at the source of html2text.py , it looks like you can turn off the wrapping behavior by setting BODY_WIDTH to 0 . Something like that:

 import html2text html2text.BODY_WIDTH = 0 text = html2text.html2text('...') 

Of course, resetting BODY_WIDTH globally changes the behavior of the module. If I needed to access this function, I would probably try to fix the module by creating the html2text() parameter to change this behavior for each call and provide this patch to the author.

+6
source

In the latest version of html2text do the following:

 import html2text h = html2text.HTML2Text() h.body_width = 0 note = h.handle("<p>Hello, <a href='http://earth.google.com/'>world</a>!") 

This removes the word wrap that html2text otherwise does

+3
source

Source: https://habr.com/ru/post/1439142/


All Articles