Parse_url () PHP works weird

Question

Parse_url () PHP works weird

I am trying to get the host from url using parse_url. But in some queries I get empty results. Here is my function:

function clean_url($urls){ $good_url=array(); for ($i=0;$i<count($urls);$i++){ $url=parse_url($urls[$i]); //$temp_string=str_replace("http://", "", $urls[$i]); //$temp_string=str_replace("https://", "", $urls[$i]); //$temp_string=substr($temp_string, 0,stripos($temp_string,"/")); array_push($good_url, $url['host']); } return $good_url; }

Input array:

 Array ( [0] => https://en.wikipedia.org/wiki/Data [1] => data.gov.ua/ [2] => e-data.gov.ua/ [3] => e-data.gov.ua/transaction [4] => https://api.jquery.com/data/ [5] => https://api.jquery.com/jquery.data/ [6] => searchdatamanagement.techtarget.com/definition/data [7] => www.businessdictionary.com/definition/data.html [8] => https://data.world/ [9] => https://en.oxforddictionaries.com/definition/data )

Array of results with empty results

 Array ( [0] => en.wikipedia.org [1] => [2] => [3] => [4] => api.jquery.com [5] => api.jquery.com [6] => [7] => [8] => data< [9] => en.oxforddictionaries.com )

+6

php parsing parse-url

Djos Dec 23 '16 at 20:13

source share

6 answers

Ben plummer · Answer 1 · 2016-12-23T20:26:08+0000

Some of the $urls that are $urls do not have schemes that call parse_url to recognize hosts as paths.

For example, parsing the data.gov.ua/ URL returns data.gov.ua/ as the path. Adding a scheme, for example. https to this url so that it https://data.gov.ua/ allows parse_url recognize data.gov.ua/ as a host.

Georges O. · Answer 2 · 2016-12-23T20:29:04+0000

I ran your script and got a problem with php:

Note: Undefined index: host

So, the variable $ url ['host'] does not exist ... If I var_dump the output in this case, the contents are returned:

 array (size=3) 'scheme' => string 'https' (length=5) 'host' => string 'en.wikipedia.org' (length=16) 'path' => string '/wiki/Data' (length=10) array (size=1) 'path' => string 'data.gov.ua/' (length=12) ( ! ) Notice: Undefined index: host array (size=1) 'path' => string 'e-data.gov.ua/' (length=14) ( ! ) Notice: Undefined index: host

As you can see, the URL is interpreted as a path.

Outputs:

$urls[] = 'data.gov.ua/'; Error. Invalid URL
$urls[] = '//data.gov.ua/'; Valid.
$urls[] = 'http://data.gov.ua/'; Valid.

Advice. Use // if you do not know if it is http or https.

By the way, you can simplify your code: p

 function clean_url(array $urls) { $good_url = []; foreach( $urls as $url ) { // add a chech on the start of the url. $parse = parse_url($url); if( isset($url['host']) ) array_push($good_url, $url['host']); else $good_url[] = 'Invalid Url'; // for example, or triger error. } return $good_url; }

See foreach and isset

Barmar · Answer 3 · 2016-12-23T20:31:19+0000

Common URL format:

 scheme://hostname:port/path?query#fragment

Each part of the URL is optional, and it uses delimiters between them to determine which parts have been provided or omitted.

The hostname is part of the URL after the // prefix. Many of your URLs skip this prefix, so they don’t have a host name.

For example, parse_url('data.gov.ua/') returns:

 Array ( [path] => data.gov.ua/ )

To get what you want, it must be parse_url('//data.gov.ua/') :

 Array ( [host] => data.gov.ua [path] => / )

This often confuses programmers, because browsers are very forgiving about entering incomplete URLs in the location field, they have a heuristic to try to decide if something is the host name or path. But APIs like parse_url() are more strict.

Manic depression · Answer 4 · 2017-02-07T08:54:39+0000

I made this simple function that gives me url (for name) and full url (for hrefs)

 public static function parseUrl($target_url) { $url = ""; $url_full = ""; if (!empty($target_url)) { $parser = @parse_url($target_url); if (!empty($parser['host'])) { $url = $parser['host']; if (!empty($parser['scheme'])) { $url_full = $parser['scheme'] . "://" . $parser['host']; } else { $url_full = "//" . $parser['host']; } } else { if (!empty($parser['path'])) { return self::parseUrl("//".$parser['path']); } } } return array('url' => $url, 'url_full' => $url_full); }

which fits well with the example

 Array ( [url] => en.wikipedia.org [url_full] => https://en.wikipedia.org ) Array ( [url] => data.gov.ua [url_full] => //data.gov.ua ) Array ( [url] => e-data.gov.ua [url_full] => //e-data.gov.ua ) Array ( [url] => e-data.gov.ua [url_full] => //e-data.gov.ua ) Array ( [url] => api.jquery.com [url_full] => https://api.jquery.com ) Array ( [url] => api.jquery.com [url_full] => https://api.jquery.com ) Array ( [url] => searchdatamanagement.techtarget.com [url_full] => //searchdatamanagement.techtarget.com ) Array ( [url] => www.businessdictionary.com [url_full] => //www.businessdictionary.com ) Array ( [url] => data.world [url_full] => https://data.world ) Array ( [url] => en.oxforddictionaries.com [url_full] => https://en.oxforddictionaries.com )

So you can use:

 <a href="{$url['url_full']}" target="_blank">{$url['url']}</a>

user7669738 · Answer 5 · 2016-12-23T20:45:43+0000

Some time ago, I developed a solution to a similar problem.
I made some changes to my original code to fit your specification.
It is functional, but not very elegant.

 function clean_url($urls) { $good_url=array(); for ($i=0;$i<count($urls);$i++){ $domain=$urls[$i]; $domain = str_replace("www.","",$domain); $domain = str_replace("https://","",$domain); $domain = str_replace("http://","",$domain); $domain=explode("/", $domain); array_push($good_url, $domain[0]); } return $good_url; } $urls=array( "0" => "https://en.wikipedia.org/wiki/Data" , "1" => "data.gov.ua/" , "2" => "e-data.gov.ua/", "3" => "e-data.gov.ua/transaction", "4" => "https://api.jquery.com/data/", "5" => "https://api.jquery.com/jquery.data/" , "6" => "searchdatamanagement.techtarget.com/definition/data" , "7" => "www.businessdictionary.com/definition/data.html" , "8" => "https://data.world/", "9" => "https://en.oxforddictionaries.com/definition/data"); echo "<pre>"; print_r(clean_url($urls)); echo "</pre>";

Yours faithfully,

Djos · Answer 6 · 2016-12-27T17:44:40+0000

This was the wrong http scheme. I have added http: // to all urls and it works

Parse_url () PHP works weird

More articles: