You may have to display the page (optionally displaying it) to make sure that you get a complete list of all resources. I used PyQT and QtWebKit in similar situations. Especially when you start counting resources dynamically enabled using javascript, trying to parse and load pages recursively using BeautifulSoup just won't work.
Ghost.py is a great client to help you get started with PyQT. Also, check out the QWebView Docs and QNetworkAccessManager docs .
Ghost.py returns a tuple (page, resources) when the page is opened:
from ghost import Ghost ghost = Ghost() page, resources = ghost.open('http://my.web.page')
resources includes all resources loaded by the source URL as HttpResource objects. You can get the URL for the downloaded resource using resource.url .
source share