I create a little algo to extract text from websites .. then find the answers (will publish the script after completion).
To do this, I need to convert all the HTML code inside and into simple readable text in English.
I manually deleted all the html tags, but some css entries are hard to get rid of. Any simple ideas on how to convert html to plain English text?
Thanks.
Arjun source
share