After getting the urls for various blogs, tumblr and wordpress pages, I ran into some html page processing issues. The fact is that I want to distinguish between the content, title and date for each blog post. I could get the date through regex, but there are so many custom scripts that people use now when the classes and html structure are so different.
Anyone have a solution that might help?
source
share