I did not consider the HTMLParser module itself, but I see that this channel essentially calls handle_data, which prints in your derived class. @ron responds to sending data directly to your function, which is completely normal. However, since you are new to OOP, you can take a look at this code.
This is Python 2.x, but I think the only thing that has changed is the location of the html.parser package instead of HTMLParser.
from HTMLParser import HTMLParser class MyParser(HTMLParser): def handle_data(self, data): self.output.append(data) def feed(self, data): self.output = [] HTMLParser.feed(self, data) p = MyParser() page = """<html><h1>title</h1><p>I'm a paragraph!</p></html>""" p.feed(page) print p.output output ['title', "I'm a paragraph!"]
Here I redefine the HTMLParser feed method. Instead, when p.feed(page) , it will call my method, which creates / sets the instance variable, called output, to an empty list, and then calls the feed method in the base class (HTMLParser) and it goes to the point that it doing fine. Thus, by overriding the feed method, I was able to do some additional things (added a new output variable). The handle_data method is similarly an override method. In fact, the handle_data HTMLParser method doesn't even do anything ... anything (as per the docs.)
So, just to clarify ...
You call p.feed(page) , which calls the MyParser.feed method. MyParser.feed sets the self.output variable and an empty list, then calls HTMLParser.feed. The handle_data method adds a line to the end of the output list.
You now have access to the data through a call to p.output.
source share