Google Search Position Regex

I am trying to get a keyword search position on Google using the regex below:

string lookup = "(<h3 class=\"r\"><a href=\")(\\w+[a-zA-Z0-9.-?=/]*)"; 

But this does not work for URLs having hypens (-) like:

www.example-xyz.com

Can someone help me fix this?

+4
source share
3 answers

Remove your hyphen with a backslash and run away that escapes the backslash with a backslash:

 string lookup = "(<h3 class=\"r\"><a href=\")(\\w+[a-zA-Z0-9.\\-?=/]*)"; 
+2
source

Since - means a range inside [] , you need to avoid it with a backslash.

 string lookup = "(<h3 class=\"r\"><a href=\")(\\w+[a-zA-Z0-9.\-?=/]*)"; 

By the way, there are many questions about stackoverflow about matching URLs with regex, search tags [regex] and [url] to see if you want a more sophisticated regex.

+1
source

Read a decent book on regular expressions, such as Jeffrey EF Friedl's Mastering Regular Expressions .

Not only will he show you that - sets the range of characters in the character class -

 [az] 

and therefore must be escaped -

 [a\-z] 

or put at the beginning -

 [-az] 

or at the end -

 [az-] 

when it is implied verbatim, but also that it is usually a mistake to analyze such markup (context-free language, in Chomsky terms) with one regular expression.

You are looking for a markup parser (e.g. BeautifulSoup or lxml, but in C #) and RFC 3986, Appendix B for the correct URI instead.

+1
source

Source: https://habr.com/ru/post/1387843/


All Articles