Capture type rel and href links in C #

Question

Capture type rel and href links in C #

I have a line that should contain a list of elements in the form, {0}, {1} and {2} are the lines, and I want to basically extract them.

I want to do this for part of the html parsing problem, and I heard that parsing html with regular expressions is bad. (Like here )

I'm not even sure how to do this with regular expressions.

This is how much I got

string format = "<link rel=\".*\" type=\".*\" href=\".*\">";
Regex reg = new Regex(format);
MatchCollection matches = reg.Matches(input, 0);
foreach (Match match in matches)
 {
        string rel = string.Empty;
        string type = string.Empty;
        string href = string.Empty;
        //not sure what to do here to get these values for each from the match
 }

Before my research found that I could be completely mistaken using regular expressions.

How do you do this using either the method I selected, or using the HTML parser?

+3

c # parsing

James w Jun 18 '09 at 18:55

source share

2 answers

HTML- HTML Agility pack,

+1

Rony 18 . '09 18:59

Dan Herbert · Accepted Answer · 2009-06-18T19:12:46+0000

HTML-, , Html Agility Pack. .

HTML , ( ), .

XPath .

HtmlDocument htmlDoc = new HtmlDocument();
htmlDoc.LoadHtml(pageMarkup);
HtmlNodeCollection nodes = htmlDoc.DocumentNode.SelectNodes("//link");
string rel;

if(nodes[0].Attributes["rel"] != null)
{
    rel = nodes[0].Attributes["rel"]; 
}

Capture type rel and href links in C #

More articles: