I need to parse a nested HTML list and convert it to a parent-child dict. Given this list:
<ul> <li>Operating System <ul> <li>Linux <ul> <li>Debian</li> <li>Fedora</li> <li>Ubuntu</li> </ul> </li> <li>Windows</li> <li>OS X</li> </ul> </li> <li>Programming Languages <ul> <li>Python</li> <li>C#</li> <li>Ruby</li> </ul> </li> </ul>
I want to convert it to dict like this:
{ 'Operating System': { 'Linux': { 'Debian': None, 'Fedora': None, 'Ubuntu': None, }, 'Windows': None, 'OS X': None, }, 'Programming Languages': { 'Python': None, 'C#': None, 'Ruby': None, } }
My initial attempt was to use find_all('li', recursive=False) . It returns top-level elements (operating system and programming languages), as well as children.
How can I do this with BeautifulSoup?
source share