Can I save CDATA partitions in BeautifulSoup?

Question

Can I save CDATA partitions in BeautifulSoup?

I use BeautifulSoup to read, modify, and write an XML file. I'm having problems deleting CDATA partitions. Here's a simplified example.

Culprit XML File:

<?xml version="1.0" ?> <foo> <bar><![CDATA[ !@ #$%^&*()_+{}|:"<>?,./;'[]\-= ]]></bar> </foo>

And here is the Python script.

 from bs4 import BeautifulSoup xmlfile = open("cdata.xml", "r") soup = BeautifulSoup( xmlfile, "xml" ) print(soup)

Here is the conclusion. Please note that there are no CDATA section tags.

 <?xml version="1.0" encoding="utf-8"?> <foo> <bar> !@ #$%^&amp;*()_+{}|:"&lt;&gt;?,./;'[]\-= </bar> </foo>

I also tried printing soup.prettify(formatter="xml") and got the same result with slightly different spaces. There are few CDATA sections in reading documents, so maybe this is an lxml thing?

Is there a way to tell BeautifulSoup to save CDATA partitions?

Update Yes, this is an lxml element. http://lxml.de/api.html#cdata So the question is, can you tell BeautifulSoup to initialize lxml with strip_cdata=False ?

+4

python xml cdata lxml beautifulsoup

mwcz May 07, '13 at 18:56

source share

1 answer

Paweł szmajda · Answer 1 · 2015-12-26T21:00:23+0000

In my case, if I use

 soup = BeautifulSoup( xmlfile, "lxml-xml" )

then cdata is saved and made available.

Can I save CDATA partitions in BeautifulSoup?

More articles: