I need to parse an html document containing "code" tags
I get blocks of code as follows:
soup = BeautifulSoup(str(content))
code_blocks = soup.findAll('code')
The problem is that if I have a code tag like this:
<code class="csharp">
List<Person> persons = new List<Person>();
</code>
BeautifulSoup to close nested tags and convert a block of code into:
<code class="csharp">
List<person> persons = new List</person><person>();
</person>
</code>
Is there a way to extract the contents of code tags as text using BeautifulSoup without letting it fix what html IT analytics thinks?
source
share