I use BeautifulSoup to read, modify, and write an XML file. I'm having problems deleting CDATA partitions. Here's a simplified example.
Culprit XML File:
<?xml version="1.0" ?> <foo> <bar><![CDATA[ !@ #$%^&*()_+{}|:"<>?,./;'[]\-= ]]></bar> </foo>
And here is the Python script.
from bs4 import BeautifulSoup xmlfile = open("cdata.xml", "r") soup = BeautifulSoup( xmlfile, "xml" ) print(soup)
Here is the conclusion. Please note that there are no CDATA section tags.
<?xml version="1.0" encoding="utf-8"?> <foo> <bar> !@ #$%^&*()_+{}|:"<>?,./;'[]\-= </bar> </foo>
I also tried printing soup.prettify(formatter="xml") and got the same result with slightly different spaces. There are few CDATA sections in reading documents, so maybe this is an lxml thing?
Is there a way to tell BeautifulSoup to save CDATA partitions?
Update Yes, this is an lxml element. http://lxml.de/api.html#cdata So the question is, can you tell BeautifulSoup to initialize lxml with strip_cdata=False ?
mwcz source share