HTML Agility Pack Select Nodes

I am trying to use the HTML flexibility package to clear some data from a site. I am really trying to figure out how to use selectnodes inside a foreach, and then export the data to a list or array.

Here is the code I'm working with so far.

string result = string.Empty; HttpWebRequest request = (HttpWebRequest)WebRequest.Create(http://www.amazon.com/gp/offer-listing/B002UYSHMM/); request.Method = "GET"; using (var stream = request.GetResponse().GetResponseStream()) using (var reader = new StreamReader(stream, Encoding.UTF8)) { result = reader.ReadToEnd(); } HtmlAgilityPack.HtmlDocument doc = new HtmlAgilityPack.HtmlDocument(); doc.Load(new StringReader(result)); HtmlNode root = doc.DocumentNode; string itemdesc = doc.DocumentNode.SelectSingleNode("//h1[@class='producttitle']").InnerText; //this works perfectly to get the title of the item //HtmlNodeCollection sellers = doc.DocumentNode.SelectNodes("//id['bucketnew']/div/table/tbody/tr/td/ul/a/img/@alt");//this does not work at all in getting the alt attribute from the seller images HtmlNodeCollection prices = doc.DocumentNode.SelectNodes("//span[@class='price']"); //this works fine getting the prices HtmlNodeCollection nodes = doc.DocumentNode.SelectNodes("//div[@class='resultsset']/table/tbody[@class='result']/tr"); //this is the code I am working on to try to collect each tr in the result. I then want to eather add each span.price to a list from this and also add each alt attribute from the seller image to a list. Once I get this working I will want to use an if statement in the case that there is text for the seller name instead of an image. List<string> sellers = new List<string>(); List<string> prices = new List<string>(); foreach (HtmlNode node in nodes) { HtmlNode seller = node.SelectSingleNode(".//img/@alt"); // I am not sure if this works sellers.Add(seller.SelectSingleNode("img").Attributes["alt"]); //this definitly does not work and will not compile. } 

I have comments in the above code showing what works and what doesn't and what I want to do.

If anyone has any suggestions or reading, this will be great! I searched forums and examples and did not come up with anything that I can use.

+6
source share
1 answer

Your first problem with the SelectNodes comment SelectNodes not work, because "id" is not the name of the element, it is the name of the attribute. You used the correct syntax in your other expressions to select an attribute and compare the value. For example, //ElementName[@attributeName='value'] . I think that even [attributeName='value'] should work, but I have not tested this.

The syntax inside the SelectNodes function is called "XPath". This link can help you.

The seller node selected is the sibling node for the current iteration, which is img with the alt attribute. However, I think the correct syntax you want is just img[@alt] .

The next problem, when you say that it will not compile, check the error message, it will probably complain about arguments like arguments. sellers.Add I think that you need to call another HtmlNode, not an attribute that returns an expression inside the append.

Also, check out the Html Agility pack docs and other syntax issues.

+11
source

Source: https://habr.com/ru/post/901390/


All Articles