HtmlAgilityPack XPath ignores

When i use

SelectSingleNode("//meta[@name='keywords']") 

it does not work, but when I use the same case used in the original document, it works well:

 SelectSingleNode("//meta[@name='keywords']") 

So the question is, how can I ask a case of ignoring?

+6
source share
4 answers

If you need a more comprehensive solution, you can write an extension function for the XPath processor that will perform case-insensitive comparisons. This is quite a bit of code, but you only write it once.

After implementing the extension, you can write your request as follows

 "//meta[@name[Extensions:CaseInsensitiveComparison('Keywords')]]" 

Where Extensions:CaseInsensitiveComparison is the extension function implemented in the example below.

NOTE: this is not very well tested. I just threw it together for this answer, so error handling, etc. does not exist!

Below is the XSLT custom context code that provides one or more extension functions

 using System; using System.Xml.XPath; using System.Xml.Xsl; using System.Xml; using HtmlAgilityPack; public class XsltCustomContext : XsltContext { public const string NamespaceUri = "http://XsltCustomContext"; public XsltCustomContext() { } public XsltCustomContext(NameTable nt) : base(nt) { } public override IXsltContextFunction ResolveFunction(string prefix, string name, XPathResultType[] ArgTypes) { // Check that the function prefix is for the correct namespace if (this.LookupNamespace(prefix) == NamespaceUri) { // Lookup the function and return the appropriate IXsltContextFunction implementation switch (name) { case "CaseInsensitiveComparison": return CaseInsensitiveComparison.Instance; } } return null; } public override IXsltContextVariable ResolveVariable(string prefix, string name) { return null; } public override int CompareDocument(string baseUri, string nextbaseUri) { return 0; } public override bool PreserveWhitespace(XPathNavigator node) { return false; } public override bool Whitespace { get { return true; } } // Class implementing the XSLT Function for Case Insensitive Comparison class CaseInsensitiveComparison : IXsltContextFunction { private static XPathResultType[] _argTypes = new XPathResultType[] { XPathResultType.String }; private static CaseInsensitiveComparison _instance = new CaseInsensitiveComparison(); public static CaseInsensitiveComparison Instance { get { return _instance; } } #region IXsltContextFunction Members public XPathResultType[] ArgTypes { get { return _argTypes; } } public int Maxargs { get { return 1; } } public int Minargs { get { return 1; } } public XPathResultType ReturnType { get { return XPathResultType.Boolean; } } public object Invoke(XsltContext xsltContext, object[] args, XPathNavigator navigator) { // Perform the function of comparing the current element to the string argument // NOTE: You should add some error checking here. string text = args[0] as string; return string.Equals(navigator.Value, text, StringComparison.InvariantCultureIgnoreCase); } #endregion } } 

Then you can use the above extension function in your XPath queries, here is an example for our case

 class Program { static string html = "<html><meta name=\"keywords\" content=\"HTML, CSS, XML\" /></html>"; static void Main(string[] args) { HtmlDocument doc = new HtmlDocument(); doc.LoadHtml(html); XPathNavigator nav = doc.CreateNavigator(); // Create the custom context and add the namespace to the context XsltCustomContext ctx = new XsltCustomContext(new NameTable()); ctx.AddNamespace("Extensions", XsltCustomContext.NamespaceUri); // Build the XPath query using the new function XPathExpression xpath = XPathExpression.Compile("//meta[@name[Extensions:CaseInsensitiveComparison('Keywords')]]"); // Set the context for the XPath expression to the custom context containing the // extensions xpath.SetContext(ctx); var element = nav.SelectSingleNode(xpath); // Now we have the element } } 
+4
source

If the actual value is unknown, I think you need to use the translation. I believe this:

 SelectSingleNode("//meta[translate(@name,'ABCDEFGHIJKLMNOPQRSTUVWXYZ','abcdefghijklmnopqrstuvwxyz')='keywords']") 

This is a hack, but this is the only option in XPath 1.0 (other than the opposite uppercase).

+8
source

Here is how I do it:

 HtmlNodeCollection MetaDescription = document.DocumentNode.SelectNodes("//meta[@name='description' or @name='Description' or @name='DESCRIPTION']"); string metaDescription = MetaDescription != null ? HttpUtility.HtmlDecode(MetaDescription.FirstOrDefault().Attributes["content"].Value) : string.Empty; 
+2
source

Alternatively, you can use the new Linq syntax, which should support case-compatibility:

  node = doc.DocumentNode.Descendants("meta") .Where(meta => meta.Attributes["name"] != null) .Where(meta => string.Equals(meta.Attributes["name"].Value, "keywords", StringComparison.OrdinalIgnoreCase)) .Single(); 

But you need to do an ugly null attribute check to prevent a NullReferenceException ...

+1
source

Source: https://habr.com/ru/post/907685/


All Articles