How to get the first element that has inner text (plain text, dropping other children) 200 or more characters long?
I am trying to create an HTML parser, for example Embed.ly , and I created a backup system where I check og:description
, then I would look for this event and only then for the description
meta tag.
This is due to the fact that most sites that even include meta description
describe their site in this tag, and not the contents of the current page.
Example:
<html> <body> <div>some characters <p>200 characters <span>some more stuff</span></p> </div> </body> </html>
Which selector can be used to get 200 characters of part of this HTML fragment? I also donβt want any other material, I donβt care what kind of element (except <script>
or <style>
), if it is the first simple text contains at least 200 characters.
What does an XPath query look like?
source share