While HTML Scraping is pretty well documented from what I see, and I understand the concept and its implementation, which is the best method for scraping the content that is hidden behind authentication forms. I refer to curettage from content that I legally have access to, so the method of automatically sending login data is what I'm looking for.
All I can think of is to create a proxy server by grabbing the bandwidth from manual login, and then configure the script to fake that bandwidth as part of the HTML scrambling. As for the language, this will most likely be done in Perl.
Has anyone had experience with this, or just a general thought?
Edit This has been answered before , but with .NET. Although it checks how I think it needs to be done, does anyone have a Perl script for this?
source share