Getting a Regular Expression URL Domain

I am trying to get the domain of a given URL. For example, http://www.facebook.com/someuser/ will return facebook.com . This URL can be specified in the following formats:

  • https://www.facebook.com/someuser (www. is optional but should be ignored)
  • www.facebook.com/someuser (http: // not required)
  • facebook.com/someuser
  • http://someuser.tumblr.com → you need to return tumblr.com only

I wrote this regular expression:

/(?: \.|\/{2})(?: www\.)?([^\/]*)/i

But it does not work as I expect.

I can do it in parts:

  • Remove http:// and https:// if they are on the line, with string.delete "/https?:\/\//i" .
  • Remove www. using string.delete "/www\./i" .
  • Get a domain with a match and /(\w+\.\w+)+/i

But this will not work with subdomains. String for testing:

 https://www.facebook.com/username http://last.fm/user/username www.google.com facebook.com/username http://sub.tumblr.com/ sub.tumblr.com 

I need this to work with as little memory and shore handling as possible.

Any ideas?

+6
source share
6 answers

Why don't you just use the URI class?

 URI.parse( your_uri ).host 

And you're done.

Only one thing, if there is no "http: //" or "https: //" at the beginning of the URL, you will have to add it, or the parsing method will not give you the host (it will be zero).

+10
source

This works for me: /^h?t?t?p?s?:?\/?\/?w?w?w?\.?(.*\.[AZ]{2,})+[AZ\/]/i It will always provide you with only a part of the domain. Take a look at this: http://rubular.com/r/0hudnJSgVT

To use it, create such a method, I put it in my helpers so that I have access to them in the views.

 def website_url(website_url) if website_url[/^h?t?t?p?s?:?\/?\/?w?w?w?\.?(.*\.[AZ\/]{2,})$/i] website_id = $1 end %Q{http://#{ website_id }} end 
+2
source

Do I need to be a regular expression? You can also do this.

 require 'uri' yourURL = URI.parse('https://www.facebook.com/username') print yourURL.host 
+1
source

You can use this regex:

 /(\w+\.\w{2,6})(?:\/|$)/ 
0
source

If you really wanted to use regex, you can try something line by line:

 test_string.scan(/\w+\.\w+(?=\/|\s|$)/) { |match| do_stuff_with(match) } 

This would not take into account domain names, for example something.co.uk, but it would match everything in your test string.

0
source

I created a String class function using the Open Class method for my purpose .

 class String def to_dn return '' if self.blank? return self.split('@').last if self.match('@') link = self link = "http://#{link}" unless link.match(/^(http:\/\/|https:\/\/)/) link = URI.parse(URI.encode(link)).host.present? ? URI.parse(URI.encode(link)).host : link.strip domain_name = link.sub(/.*?www./,'') domain_name = domain_name.match(/[AZ]+.[AZ]{2,4}$/i).to_s if domain_name.split('.').length >= 2 && domain_name.match(/[AZ]+.[AZ]{2,4}$/i).present? end end 

Example:

  1. "https://www.facebook.com/someuser".to_dn = "facebook.com" 2. "www.facebook.com/someuser".to_dn = "facebook.com" 3. "facebook.com/someuser".to_dn = "facebook.com" 4. "http://someuser.tumblr.com".to_dn = "tumblr.com" 5. "dc.ads.linkedin.com".to_dn = "linkedin.com" 6. ' your_name@domain.com '.to_dn = "domain.com" 

It also works for email addresses (which are required for my purpose). Hope this will be helpful to others. Correct me if you find something wrong :)

Note. This will not work for "www.domainname.co.in". I'm working on it:)

0
source

Source: https://habr.com/ru/post/893532/


All Articles