Javascript: extracts urls from string (incl. Querystring) and returned array

I know this has been asked a thousand times (apologies), but search SO / Google etc. I have yet to get a final answer.

Basically, I need a JS function that, when passing a string, identifies and extracts all URLs based on a regular expression, returning an array of all found. eg:

function findUrls(searchText){ var regex=??? result= searchText.match(regex); if(result){return result;}else{return false;} } 

The function should be able to detect and return any potential URLs. I know about the main difficulties / outcomes with this (closing parentheses, etc.), So I feel that the process should be:

Divide the line ( searchText ) into different sections, beginning / ending) with nothing, a space or a carriage that returns either side of it, resulting in clear fragments of content, for example. make a split.

For each piece of content that results from a split, see if it matches the logic of the URL of any construct, namely whether it contains the period immediately preceding the text (one constant rule for determining a potential URL).

The regular expression should see if another period immediately follows this period, such as the type valid for tld, the directory structure and query string preceded by a valid type for the URL.

I know that false positives can occur, however, any returned values ​​will be checked when the URL itself is called, so this can be ignored. Other functions that I found often also do not return a query string for URLs, if any.

From a block of text, the function should thus be able to return any type of URL, even if it means identifying will.i.am as valid!

eg. http://www.google.com , google.com, www.google.com, http://google.com , ftp.google.com, https: // etc. and any output using the query string should be returned ...

Thank you so much, sorry again if this exists elsewhere on SO, but my searches did not return it.

+6
source share
4 answers

I just use URI.js - simplifies.

 var source = "Hello www.example.com,\n" + "http://google.com is a search engine, like http://www.bing.com\n" + "http://exämple.org/foo.html?baz=la#bumm is an IDN URL,\n" + "http://123.123.123.123/foo.html is IPv4 and " + "http://fe80:0000:0000:0000:0204:61ff:fe9d:f156/foobar.html is IPv6.\n" + "links can also be in parens (http://example.org) " + "or quotes »http://example.org«."; var result = URI.withinString(source, function(url) { return "<a>" + url + "</a>"; }); /* result is: Hello <a>www.example.com</a>, <a>http://google.com</a> is a search engine, like <a>http://www.bing.com</a> <a>http://exämple.org/foo.html?baz=la#bumm</a> is an IDN URL, <a>http://123.123.123.123/foo.html</a> is IPv4 and <a>http://fe80:0000:0000:0000:0204:61ff:fe9d:f156/foobar.html</a> is IPv6. links can also be in parens (<a>http://example.org</a>) or quotes »<a>http://example.org</a>«. */ 
+18
source

You can use the regex from URI.js :

 // gruber revised expression - http://rodneyrehm.de/t/url-regex.html var uri_pattern = /\b((?:[az][\w-]+:(?:\/{1,3}|[a-z0-9%])|www\d{0,3}[.]|[a-z0-9.\-]+[.][az]{2,4}\/)(?:[^\s()<>]+|\(([^\s()<>]+|(\([^\s()<>]+\)))*\))+(?:\(([^\s()<>]+|(\([^\s()<>]+\)))*\)|[^\s`!()\[\]{};:'".,<>?«»""'']))/ig; 

String # match and String # replace can help ...

+12
source

try it

 var expression = /[ -a-zA-Z0-9@ :%_\+.~#?&//=]{2,256}\.[az]{2,4}\b(\/[ -a-zA-Z0-9@ :%_\+.~#?&//=]*)?/gi; 

you can use this site to test regexp http://gskinner.com/RegExr/

+1
source

After the regex, extracts the URLs from the string (including the query string) and returns an array

 var url = "asdasdla hakjsdh aaskjdh https://www.google.com/search?q=add+a+element+to+dom+tree&oq=add+a+element+to+dom+tree&aqs=chrome..69i57.7462j1j1&sourceid=chrome&ie=UTF-8 askndajk nakjsdn aksjdnakjsdnkjsn"; var matches = strings.match(/\bhttps?::\/\/\S+/gi) || strings.match(/\bhttps?:\/\/\S+/gi); 

Exit:

 ["https://www.google.com/search?q=format+to+6+digir&…s=chrome..69i57.5983j1j1&sourceid=chrome&ie=UTF-8"] 

Note: This processes both http: // with a single colon and http :: // with a double colon in a string, and vice versa for https, so this is safe for you. :)

0
source

Source: https://habr.com/ru/post/919037/


All Articles