Regex for any line except "www"? (Subdomain)

I was wondering if anyone there could help me with regex in C #. I think it's pretty simple, but I waved my brain and am not quite sure why it is so hard for me. :)

I found some examples, but I can not manipulate them to do what I need.

I just need to match ANY alphanumeric string of subdomains + dashes, which is not "www", and just before "."

Also, ideally, if someone were to type “www.subdomain.domain.com”, I would like www to be ignored if possible. If not, this is not a problem.

In other words, I would like to match:

  • (test) .domain.com
  • (test2) .domain.com
  • (wwwasdf) .domain.com
  • (asdfwww) .domain.com
  • (w) .domain.com
  • (WWWWWW) .domain.com
  • (asfd-12345-www-bananas) .domain.com
  • WWW. (subdomain) .domain.com

And I do not want to match:

  • ( Www ) .domain.com

It seems to me that this should be easy, but I have problems with the "do not match" part.

What it costs for is to use the IIS 7 URLs in the Rewrite module to rewrite for all non-existent subdomains.

Thanks!

+6
source share
7 answers

Is the rest of the domain name permanent, for example .domain.com , as in your examples? Try the following:

 \b(?!www\.)(\w+(?:-\w+)*)(?=\.domain\.com\b) 

Explanation:

  • \w+(?:-\w+)* matches the common component of the domain name, as you described (but a little more strictly).

  • (?=\.domain\.com\b) makes it the first subdomain (i.e., the last before the actual domain name).

  • \b(?!www\.) guarantees that it is not www. (without \b , it can skip the first w and match only ww. ).

In my tests, this regular expression matches exactly the parts that you highlighted in your examples, and does not match www. in any of the last two examples.


EDIT: Here is another version that matches the entire name, capturing fragments in different groups:

 ^((?:\w+(?:-\w+)*\.)*)((?!www\.)\w+(?:-\w+)*)(\.domain\.com)$ 

In most cases, group $1 will contain an empty string, because nothing exists before the name of the subdomain, but here is how it breaks down www.subdomain.domain.com :

 $1: "www." $2: "subdomain" $3: ".domain.com" 
+8
source
 ^www\. 

And invert the logic for this bit, so if it matches, your string does not match your requirements.

+2
source

Just replace the original with everything after www, if present (pseudocode):

 str = re.sub("(www\.)?(.+)", "\2", str) 

Or, if you just want to match those that are “wrong,” use this:

 (www\.([^.]+)\.([^.]+)) 

And if you have to match all those that make good use of this:

 (([^w]|w[^w]|ww[^w]|www[^.]|www\.([^.]+)\.([^.]+)\.).+) 
+1
source

Just think here:

 ^(?:www\.)?([^\.]+)\.([^\.]+)\. 

Where...

  • (?: WWW \.)? looking for a possible "www" in the beginning without capturing
  • ([^ \.] +) \. looking for a subdomain (nothing but a point at least once to a point)
  • ([^ \.] +) \. Searches for a domain ending in a dot (everything except the dot at least once before the dot)

Note. This expression will not work with double subdomains: www.subsub.sub.domain.com

+1
source

It:

 ^(?:www\.)?([^.]*) 

It matches exactly what you put in parentheses in your question. You will find answers to your questions in group (1). You must bind it to the beginning of the line. Use this:

 ^(?:www\.)?(.*) 

If you need everything in the URL except "www." . One example that you did not include in your test cases was "alpha.subdomain.domain.com." If you need to match everything except "www." That is, not in the "domain.com" part of the string, use this:

 ^(?:www\.)?(.+)((?:\.(?:[^./\?]+)){2}) 

It will resolve all of your cases, but will also return alpha.subdomain from my additional test case. And for encore it puts ".domain.com" in group 2 and will not match this if there are directories or parameters in the URL.

I checked all of these answers here .

Finally, for the sake of glut, if you want to decline addresses starting with "www." you can use a negative lookbehind:

 ^....(?<!www\.).* 
+1
source

It works:

 ^(?!www\.domain\.com)(?:[az\-\.]+\.domain\.com)$ 

Or with the necessary backslashes for Java strings (or C #?):

 "^(?!www\\.domain\\.com)(?:[az\\-\\.]+\\.domain\\.com)$" 

There may be a more concise way (i.e. just dialing domain.com once), but it works.

+1
source

I think I would share this.

 (\\.[Az]{2,3}){1,2}$ 

Removes all ".com.au" ".co.uk" from the end. You can then perform an additional search to determine if the URL contains a subdomain.

eg.

subdaomin1.sitea.com.au
subdaomin2.siteb.co.uk
subdaomin3.sitec.net.au

everyone becomes:

subdomain1.sitea
subdomain2.siteb
subdomain3.sitec

0
source

Source: https://habr.com/ru/post/895277/


All Articles