Choosing (siblings) between two tags using XPath (in .NET)

I am using .NET 3.5 (C #) and the HTML Agility Pack to make some web scrapers. Some of the fields I need to extract are structured as paragraphs in which components are separated by line break tags. I would like to be able to select individual components between breaks. Each component can be formed from many elements (i.e., it can be not only one line). Example:

<h3>Section title</h3>
<p>
  <b>Component A</b><br />
  Component B <i>includes</i> <strong>multiple elements</strong><br />
  Component C
</p>

I would like to choose

<b>Component A</b>

Then:

Component B <i>includes</i> <strong>multiple elements</strong>

And then:

Component C

There may also be more ( <br />separated) components.

I can easily get the first component with:

p/br[1]/preceding-sibling::node()

I can also easily get the latest component with:

p/br[2]/following-sibling::node()

, / ( , , node X node Y).

- ndash; , , , XPath , , .

, , XPath , , ( , "). AakashM XPath, .

! , .

2

, , .

:

int i = 0;
do
{
    yield return para.SelectNodes(String.Format(
        "node()[not(self::br) and count(preceding-sibling::br) = {0}]", i));
    ++i;
} while (para.SelectSingleNode(String.Format("br[{0}]", i)) != null);

, - XPath, , br. , , - ( , , , , , , XPath).

( , AakashM):

using System;
using System.Collections.Generic;
using System.Xml;

namespace TestXPath
{
    class Program
    {
        static void Main(string[] args)
        {
            XmlDocument doc = new XmlDocument();
            doc.LoadXml(@"
<x>
 <h3>Section title</h3>
 <p>
  <b>Component A</b><br />
  Component B <i>includes</i> multiple <strong>elements</strong><br />
  Component C
 </p>
</x>
            ");


            foreach (var nodes in SplitOnLineBreak(doc.SelectSingleNode("x/p")))
            {
                Dump(nodes);
                Console.WriteLine();
            }

            Console.ReadLine();
        }

        private static IEnumerable<XmlNodeList> SplitOnLineBreak(XmlNode para)
        {
            int i = 0;
            do
            {
                yield return para.SelectNodes(String.Format(
                    "node()[not(self::br) and count(preceding-sibling::br) = {0}]", i));
                ++i;
            } while (para.SelectSingleNode(String.Format("br[{0}]", i)) != null);
        }

        private static void Dump(XmlNodeList nodes)
        {
            foreach (XmlNode node in nodes)
            {
                Console.WriteLine(string.Format("-->{0}<---", 
                                  node.OuterXml));                    
            }
        }
    }
}
+3
4

XPath 2.0 XPath 1.0, XSLT.

XPath 1.0, .NET, :

  • "p" node node.

  • <br /> "p" node:

    ()

  • N - , 2. $k 0 N :

    3.1 , $k <br />:

    node() [not (self:: br) count (previous: br) = $k]

    3.2 node

    3.3 , 3.2. , .

. , $k 3.1, .

0

"", br s, XPath :

//node()[preceding::br and following::br]

preceding following br s, .

(, XmlDocument, .NET 2.0...)

using System;
using System.Xml;

namespace ConsoleApplication1
{
    class Program
    {
        static void Main(string[] args)
        {
            XmlDocument doc = new XmlDocument();
            doc.LoadXml(@"
<x>
 <h3>Section title</h3>
 <p>
  <b>Component A</b><br />
  Component B <i>includes</i> <strong>multiple elements</strong><br />
  Component C
 </p>
</x>
            ");

            XmlNodeList nodes = doc.SelectNodes(
                "//node()[preceding::br and following::br]");

            Dump(nodes);

            Console.ReadLine();
        }

        private static void Dump(XmlNodeList nodes)
        {
            foreach (XmlNode node in nodes)
            {
                Console.WriteLine(string.Format("-->{0}<---", 
                                  node.OuterXml));                    
            }
        }
    }
}

:

-->
      Component B <---
--><i>includes</i><---
-->includes<---
--><strong>multiple elements</strong><---
-->multiple elements<---

, XmlNodeList br s.

: XPath node , node, br, , a br.

+1

:

p/*[not(local-name()='br')]

And then index this expression for any term you want

EDIT:

For your indexing problem:

p/*[not(local-name()='br') and position() < x and position() > y]
0
source

Try using the position () function or maybe count (). Here is a hunch that might help you get the correct syntax.

p/*[position() > position(/p/br[1]) and position() < position(/p/br[2])] 

EDIT: Read the comments before the vote or comments .

0
source

Source: https://habr.com/ru/post/1715566/


All Articles