Problem using Xpath "starts with" to parse xhtml

I am trying to parse a webpage to receive forum posts.
The beginning of each message begins with the following format

<div id="post_message_somenumber"> 

and I only want to get the first

I tried xpath='//div[starts-with(@id, '"post_message_')]' in yql without success
I'm still learning this who has suggestions

+4
source share
3 answers

I tried xpath='//div[starts-with(@id, '"post_message_')]' in yql without success i'm still learning this who has suggestions

If the problem is not related to many nested apostrophes and to the closed double quotation mark, then the most probable reason (we can only guess without showing the XML document) is that the default namespace is used.

Specifying the names of elements that are in the default namespace is the most frequently asked question in XPath. If you are looking for the "default XPath namespace" on SO or the Internet, you will find many sources with the right solution.

Typically, a special method should be called that associates the prefix (for example, "x:" ) with the default namespace. Then, in the XPath expression, each element name "someName" should be replaced with "x:someName ."

Here is a good answer on how to do this in C # .

Read the documentation in your language / xpath -engine how something similar should be done in your specific environment.

+5
source

I think I have a solution that does not require interaction with namespaces.

Here you select all the relevant divs :

 //div[@id[starts-with(.,"post_message")]] 

But you said that you only want the β€œfirst” (I assume you mean the first β€œhit” on the whole page?). Here is a small modification that selects only the first match result :

 (//div[@id[starts-with(.,"post_message")]])[1] 

They use a dot to represent the id value in the starts-with() function. You may need to hide special characters in your language.

This works fine for me in PowerShell:

 # Load a sample xml document $xml = [xml]'<root><div id="post_message_somenumber"/><div id="not_post_message"/><div id="post_message_somenumber2"/></root>' # Run the xpath selection of all matching div's $xml.selectnodes('//div[@id[starts-with(.,"post_message")]]') 

Result:

 id -- post_message_somenumber post_message_somenumber2 

Or, just for the first match:

 # Run the xpath selection of the first matching div $xml.selectnodes('(//div[@id[starts-with(.,"post_message")]])[1]') 

Result:

 id -- post_message_somenumber 
+3
source
 @FindBy(xpath = "//div[starts-with(@id,'expiredUserDetails') and contains(text(), 'Details')]") private WebElementFacade ListOfExpiredUsersDetails; 

This gives a list of all the elements on the page with the common identifier expiredUserDetails , and also contains the text or Details element

+1
source

Source: https://habr.com/ru/post/1337894/


All Articles