Prior to version 3.0.5, BeautifulSoup is used to treat <textarea> content as HTML. Now he sees it as text. The document I'm processing has HTML inside the textarea tags, and I'm trying to process it.
I tried:
for textarea in soup.findAll('textarea'): contents = BeautifulSoup.BeautifulSoup(textarea.contents) textarea.replaceWith(contents.html(text=True))
But I get errors. I cannot find this in the documentation, and alternative parsers do not help. Does anyone know how I can parse text fields like HTML?
Edit:
HTML example:
<textarea class="ks-lazyload-custom"> <div class="product-view product-view-rug"> Foobar Womble <div class="product-view-head"> <img src="tps/i1/fo-25.gif" /> </div> </div> </textarea>
Error:
File "D:\src\cross\tserver\src\tools\sitecrawl\BeautifulSoup.py", line 1913, in _detectEncoding '^<\?.*encoding=[\'"](.*?)[\'"].*\?>').match(xml_data) TypeError: expected string or buffer
I am looking for a way to take an element, extract the contents, parse them with BeautifulSoup, collapse it into text and then replace the contents of the original element (or replace the whole element) with that text.
As for the real world versus specifications, this is actually not particularly relevant here. Data needs to be analyzed, I'm looking for a way to do this.
source share