An option for cleaning (another, and then getting the page), which may be less reliable (depending on your needs), but will offer a solution to the problem you have - use some kind of shell around a full-fledged web browser and literally encode the usage pattern and retrieve relevant data. Since you did not indicate which programming language you know, I will give 3 examples: 1) Watir - ruby, 2) Watin - IE and Firefox via .net, 3) Selenium - IE through C # / Java / Perl / PHP / Ruby / Python
I will give a small example using Watin and C #:
IE browser = new IE(); browser.GoTo(YOUR CNN URL); List visibleComments = Browser.List(Find.ById("dsq-comments"));
Note: I am not familiar with disqus, but it’s best to make all the comments show by looping the link and clicking on the parts of the code that I posted until all the comments are visible, and clear the List dsq-comments element
source share