Getting a Regular Expression URL Domain

Question

Getting a Regular Expression URL Domain

I am trying to get the domain of a given URL. For example, http://www.facebook.com/someuser/ will return facebook.com . This URL can be specified in the following formats:

https://www.facebook.com/someuser (www. is optional but should be ignored)
www.facebook.com/someuser (http: // not required)
facebook.com/someuser
http://someuser.tumblr.com → you need to return tumblr.com only

I wrote this regular expression:

/(?: \.|\/{2})(?: www\.)?([^\/]*)/i

But it does not work as I expect.

I can do it in parts:

Remove http:// and https:// if they are on the line, with string.delete "/https?:\/\//i" .
Remove www. using string.delete "/www\./i" .
Get a domain with a match and /(\w+\.\w+)+/i

But this will not work with subdomains. String for testing:

 https://www.facebook.com/username http://last.fm/user/username www.google.com facebook.com/username http://sub.tumblr.com/ sub.tumblr.com

I need this to work with as little memory and shore handling as possible.

Any ideas?

+6

string url ruby regex parsing

Fábio perez Jul 25 '11 at 22:28

source share

6 answers

This works for me: /^h?t?t?p?s?:?\/?\/?w?w?w?\.?(.*\.[AZ]{2,})+[AZ\/]/i It will always provide you with only a part of the domain. Take a look at this: http://rubular.com/r/0hudnJSgVT

To use it, create such a method, I put it in my helpers so that I have access to them in the views.

 def website_url(website_url) if website_url[/^h?t?t?p?s?:?\/?\/?w?w?w?\.?(.*\.[AZ\/]{2,})$/i] website_id = $1 end %Q{http://#{ website_id }} end

+2

gabo Oct 11 '11 at 10:40

source share

Do I need to be a regular expression? You can also do this.

 require 'uri' yourURL = URI.parse('https://www.facebook.com/username') print yourURL.host

+1

citizen conn Jul 25 '11 at 10:32

source share

You can use this regex:

 /(\w+\.\w{2,6})(?:\/|$)/

0

Paulpro Jul 25 '11 at 10:38

source share

If you really wanted to use regex, you can try something line by line:

 test_string.scan(/\w+\.\w+(?=\/|\s|$)/) { |match| do_stuff_with(match) }

This would not take into account domain names, for example something.co.uk, but it would match everything in your test string.

0

pbaumann Jul 25 '11 at 23:27

source share

I created a String class function using the Open Class method for my purpose .

 class String def to_dn return '' if self.blank? return self.split('@').last if self.match('@') link = self link = "http://#{link}" unless link.match(/^(http:\/\/|https:\/\/)/) link = URI.parse(URI.encode(link)).host.present? ? URI.parse(URI.encode(link)).host : link.strip domain_name = link.sub(/.*?www./,'') domain_name = domain_name.match(/[AZ]+.[AZ]{2,4}$/i).to_s if domain_name.split('.').length >= 2 && domain_name.match(/[AZ]+.[AZ]{2,4}$/i).present? end end

Example:

  1. "https://www.facebook.com/someuser".to_dn = "facebook.com" 2. "www.facebook.com/someuser".to_dn = "facebook.com" 3. "facebook.com/someuser".to_dn = "facebook.com" 4. "http://someuser.tumblr.com".to_dn = "tumblr.com" 5. "dc.ads.linkedin.com".to_dn = "linkedin.com" 6. ' your_name@domain.com '.to_dn = "domain.com"

It also works for email addresses (which are required for my purpose). Hope this will be helpful to others. Correct me if you find something wrong :)

Note. This will not work for "www.domainname.co.in". I'm working on it:)

0

Lalit kumar 12 sept '17 at 11:21

source share

Maurício Linhares · Accepted Answer · 2011-07-25T22:35:01+0000

Why don't you just use the URI class?

 URI.parse( your_uri ).host

And you're done.

Only one thing, if there is no "http: //" or "https: //" at the beginning of the URL, you will have to add it, or the parsing method will not give you the host (it will be zero).

Getting a Regular Expression URL Domain

More articles: