Could it be a perl solution based on a parser?
perl -0777 -MHTML::Strip -nlE 'say HTML::Strip->new->parse($_)' file.html
You must install the HTML :: Strip module with the cpan HTML::Strip command.
as an alternative
you can use standard OS X utility: textutil see man page
textutil -convert txt file.html
will create file.txt with split html tags or
textutil -convert txt -stdin -stdout < file.txt | some_command
Another alternative
Some systems have the lynx text browser installed. You can use:
lynx -dump file.html #or lynx -stdin -dump < file.html
But in your case, you can only rely on pure sed or awk ... IMHO solutions.
But, if you have perl (and just not the HTML :: Strip module), the following is still better than sed
perl -0777 -pe 's/<.*?>//sg'
because it will also remove the following (multi-line and common) tag:
<a href="#" class="some" >link text</a>
jm666 source share