How to determine the top level domain of a URL object using java?

Considering this:

URL u=new URL("someURL"); 

How to determine the top level domain of a URL.

+6
source share
4 answers

So you want the top-level domain part ?

 //parameter urlString: a String //returns: a String representing the TLD of urlString, or null iff urlString is malformed private String getTldString(String urlString) { URL url = null; String tldString = null; try { url = new URL(urlString); String[] domainNameParts = url.getHost().split("\\."); tldString = domainNameParts[domainNameParts.length-1]; } catch (MalformedURLException e) { } return tldString; } 

Test it out!

 @Test public void identifyLocale() { String ukString = "http://www.amazon.co.uk/Harry-Potter-Sheet-Complete-Series/dp/0739086731"; logger.debug("ukString TLD: {}", getTldString(ukString)); String deString = "http://www.amazon.de/The-Essential-George-Gershwin/dp/B00008GEOT"; logger.debug("deString TLD: {}", getTldString(deString)); String ceShiString = "http://例子.测试"; logger.debug("ceShiString TLD: {}", getTldString(ceShiString)); String dokimeString = "http://παράδειγμα.δοκιμή"; logger.debug("dokimeString TLD: {}", getTldString(dokimeString)); String nullString = null; logger.debug("nullString TLD: {}", getTldString(nullString)); String lolString = "lol, this is a malformed URL, amirite?!"; logger.debug("lolString TLD: {}", getTldString(lolString)); } 

Output:

 ukString TLD: uk deString TLD: de ceShiString TLD: 测试dokimeString TLD: δοκιμή nullString TLD: null lolString TLD: null 
+4
source

The host portion of the URL corresponds to RFC 2732 according to docs . That would mean that just splitting the string you get from

  String host = u.getHost(); 

will not be enough. You will need to make sure that you agree to RFC 2732 when searching on the host, or if you can guarantee that all addresses are of the form server.com, then you can search for the latter. in line and grab tld.

+3
source

Use the URL#getHost() , and if necessary after that String#split() to "\\." .

Update : if you really have an IP address as a host, you need to use InetAddress#getHostName() independently.

+1
source

Guava provides a useful utility for this. This works as follows:

InternetDomainName.from("someurl.co.uk").publicSuffix() will get you co.uk InternetDomainName.from("someurl.de").publicSuffix() will get you de

0
source

Source: https://habr.com/ru/post/1299450/


All Articles