Get the second level domain name at the URL
Is there any way to get top level domain name from url
e.g. " https://images.google.com/blah " => "google"
I found this:
var domain = new URL(pageUrl).hostname; but he gives me "images.google.com" instead of Google.
The unit tests that I have are:
https://images.google.com => google https://www.google.com/blah => google https://www.google.co.uk/blah => google https://www.images.google.com/blah => google You can do it:
location.hostname.split('.').pop() EDIT
Look at the change in your question, you will need a list of all TLDs to map to and remove from the host name, then you can use split('.').pop()
// small example list var re = new RegExp('\.+(co.uk|me|com|us)') var secondLevelDomain = 'https://www.google.co.uk'.replace(re, '').split('.').pop() This is the easiest solution, in addition to saving lists of top-level domains in white and black.
Match to a top-level domain if it has two or more xxxx.yyy characters
Match on the top-level domain and subdomains if both are under the two characters "xxxxx.yy.zz"
Delete match.
Return everything between the last period and the end of the line.
I broke it into two rules OR | regex:
(\.[^\.]*)(\.*$)- the last period to the end of the line if the upper domain β = 3.(\.[^\.]{0,2})(\.[^\.]{0,2})(\.*$)- Top and subdomain: <= 2 ..
var regex_var = new RegExp(/(\.[^\.]{0,2})(\.[^\.]{0,2})(\.*$)|(\.[^\.]*)(\.*$)/); var unit_test = 'xxx.yy.zz.'.replace(regex_var, '').split('.').pop(); document.write("Returned user entered domain: " + unit_test + "\n"); var result = location.hostname.replace(regex_var, '').split('.').pop(); document.write("Current Domain: " + result); What you want to extract from the URL is not a top level domain (TLD). TLD is the rightmost part e.g. ..com.
Having said that, I donβt think there is an easy way to do this, because there are URLs that have two βcommonβ parts, such as β.co.ukβ, and I suppose you donβt want to extract β.co "in these cases. Perhaps you can use the list of existing two-part "TLDs" to verify that you know when to retrieve the part.
function getDomainName( hostname ) { var TLDs = new RegExp(/\.(com|net|org|biz|ltd|plc|edu|mil|asn|adm|adv|arq|art|bio|cng|cnt|ecn|eng|esp|etc|eti|fot|fst|g12|ind|inf|jor|lel|med|nom|ntr|odo|ppg|pro|psc|psi|rec|slg|tmp|tur|vet|zlg|asso|presse|k12|gov|muni|ernet|res|store|firm|arts|info|mobi|maori|iwi|travel|asia|web|tel)(\.[az]{2,3})?$|(\.[^\.]{2,3})(\.[^\.]{2,3})$|(\.[^\.]{2})$/); return hostname.replace(TLDs, '').split('.').pop(); } /*** TEST ***/ var domains = [ 'domain.com', 'subdomain.domain.com', 'www.subdomain.domain.com', 'www.subdomain.domain.info', 'www.subdomain.domain.info.xx', 'mail.subdomain.domain.co.uk', 'mail.subdomain.domain.xxx.yy', 'mail.subdomain.domain.xx.yyy', 'mail.subdomain.domain.xx', 'domain.xx' ]; var result = []; for (var i = 0; i < domains.length; i++) { result.push( getDomainName( domains[i] ) ); } alert ( result.join(' | ') ); // result: domain | domain | domain | domain | domain | domain | domain | domain | domain | domain