Assuming you want to do this with Groovy (guessing based on the Groovy tag), your approaches are likely to be either highly shell-script oriented or using Java libraries. In case of shell-scripting, I would agree with moogs, using Lynx or Elinks is probably the easiest way to do this. Otherwise, look at HTMLParser and see Processing each word in a File (scroll down to find the corresponding code fragment)
You are probably stuck looking for Java libraries to use with Groovy for parsing HTML, since it doesn't display, there are Groovy libs for it. If you are not using Groovy, then please post the language you want, as there is a lot of HTML for text tools , depending on which language you work in.
source share