How to parse author name and book title from purified HTML using XPath?

The HTML you see below is the text that I deleted from the remote site, as it is, in a local variable.

Now I need to parse tags authorNameand bookTitlefrom HTML tags in your own variables, taking into account the following harmonized format purified text:

<p>
  William Faulkner - 'Light In August'
  <br/>
  William Faulkner - 'Sanctuary'
  <br/>
  William Faulkner - 'The Sound and the Fury'
</p>

Can this be done in XPath?

+3
source share
3 answers

Yes. And easy:

//p/text()

They will give you three separate text nodes:

"
  William Faulkner - 'Light In August'
  ",
"
  William Faulkner - 'Sanctuary'
  ",
"
  William Faulkner - 'The Sound and the Fury'
"

Remember that the previous and final spaces (including any line breaks) are always part of the node text. Trim the result.

I believe that you do not need help in dividing the resulting lines into the author and the title.

+2
source

XPath 1.0 node childs p:

/p/text()

() () - node

substring-before(/p/text()[1],'-')

:

  William Faulkner 

substring-after(/p/text()[1],'-')

:

 'Light In August'       

XPath 2.0:

/p/text()/substring-before(.,'-')

3 :

William Faulkner William Faulkner William Faulkner 

/p/text()/substring-after(.,'-')

3 :

'Light In August' 'Sanctuary' 'The Sound and the Fury'
+2

$N- XPath:

substring-before(normalize-space(p/text()[$N]), ' -')

$N- XPath:

substring-after(normalize-space(p/text()[$N]), ' - ')

:

count(p/text())

XPath, $N

[1,count(p/text())]
+1

Source: https://habr.com/ru/post/1770137/


All Articles