I use bs4 to parse an XML file and write it again to a new XML file.
Input file:
<tag1>
<tag2 attr1="a1">" example text "</tag2>
<tag3>
<tag4 attr2="a2">" example text "</tag4>
<tag5>
<tag6 attr3="a3">' example text '</tag6>
</tag5>
</tag3>
</tag1>
Script:
soup = BeautifulSoup(open("input.xml"), "xml")
f = open("output.xml", "w")
f.write(soup.encode(formatter='minimal'))
f.close()
Conclusion:
<tag1>
<tag2 attr1="a1"> " example text " </tag2>
<tag3>
<tag4 attr2="a2"> " example text " </tag4>
<tag5>
<tag6 attr3="a3"> ' example text ' </tag6>
</tag5>
</tag3>
</tag1>
I want to save "and '. I tried to use all the formatting encoding options - Minimal, xml, html, none. But none of them solved this problem.
Then I tried to replace "manually.
for tag in soup.find_all(text=re.compile("\"")):
res = tag.string
res1 = res.replace("\"",""")
tag.string.replaceWith(res1)
But it gave the result below
<tag1>
<tag2 attr1="a1"> &quot; example text &quot; </tag2>
<tag3>
<tag4 attr2="a2"> &quot; example text &quot; </tag4>
<tag5>
<tag6 attr3="a3"> ' example text ' </tag6>
</tag5>
</tag3>
</tag1>
It replaces and by &. I'm confused here. Please help me resolve this.
source
share