Can I save CDATA partitions in BeautifulSoup?

I use BeautifulSoup to read, modify, and write an XML file. I'm having problems deleting CDATA partitions. Here's a simplified example.

Culprit XML File:

<?xml version="1.0" ?> <foo> <bar><![CDATA[ !@ #$%^&*()_+{}|:"<>?,./;'[]\-= ]]></bar> </foo> 

And here is the Python script.

 from bs4 import BeautifulSoup xmlfile = open("cdata.xml", "r") soup = BeautifulSoup( xmlfile, "xml" ) print(soup) 

Here is the conclusion. Please note that there are no CDATA section tags.

 <?xml version="1.0" encoding="utf-8"?> <foo> <bar> !@ #$%^&amp;*()_+{}|:"&lt;&gt;?,./;'[]\-= </bar> </foo> 

I also tried printing soup.prettify(formatter="xml") and got the same result with slightly different spaces. There are few CDATA sections in reading documents, so maybe this is an lxml thing?

Is there a way to tell BeautifulSoup to save CDATA partitions?

Update Yes, this is an lxml element. http://lxml.de/api.html#cdata So the question is, can you tell BeautifulSoup to initialize lxml with strip_cdata=False ?

+4
source share
1 answer

In my case, if I use

 soup = BeautifulSoup( xmlfile, "lxml-xml" ) 

then cdata is saved and made available.

+4
source

Source: https://habr.com/ru/post/1479602/


All Articles