Parse_url () PHP works weird

I am trying to get the host from url using parse_url. But in some queries I get empty results. Here is my function:

function clean_url($urls){ $good_url=array(); for ($i=0;$i<count($urls);$i++){ $url=parse_url($urls[$i]); //$temp_string=str_replace("http://", "", $urls[$i]); //$temp_string=str_replace("https://", "", $urls[$i]); //$temp_string=substr($temp_string, 0,stripos($temp_string,"/")); array_push($good_url, $url['host']); } return $good_url; } 

Input array:

 Array ( [0] => https://en.wikipedia.org/wiki/Data [1] => data.gov.ua/ [2] => e-data.gov.ua/ [3] => e-data.gov.ua/transaction [4] => https://api.jquery.com/data/ [5] => https://api.jquery.com/jquery.data/ [6] => searchdatamanagement.techtarget.com/definition/data [7] => www.businessdictionary.com/definition/data.html [8] => https://data.world/ [9] => https://en.oxforddictionaries.com/definition/data ) 

Array of results with empty results

 Array ( [0] => en.wikipedia.org [1] => [2] => [3] => [4] => api.jquery.com [5] => api.jquery.com [6] => [7] => [8] => data< [9] => en.oxforddictionaries.com ) 
+6
source share
6 answers

Some of the $urls that are $urls do not have schemes that call parse_url to recognize hosts as paths.

For example, parsing the data.gov.ua/ URL returns data.gov.ua/ as the path. Adding a scheme, for example. https to this url so that it https://data.gov.ua/ allows parse_url recognize data.gov.ua/ as a host.

+5
source

I ran your script and got a problem with php:

Note: Undefined index: host

So, the variable $ url ['host'] does not exist ... If I var_dump the output in this case, the contents are returned:

 array (size=3) 'scheme' => string 'https' (length=5) 'host' => string 'en.wikipedia.org' (length=16) 'path' => string '/wiki/Data' (length=10) array (size=1) 'path' => string 'data.gov.ua/' (length=12) ( ! ) Notice: Undefined index: host array (size=1) 'path' => string 'e-data.gov.ua/' (length=14) ( ! ) Notice: Undefined index: host 

As you can see, the URL is interpreted as a path.

Outputs:

  • $urls[] = 'data.gov.ua/'; Error. Invalid URL
  • $urls[] = '//data.gov.ua/'; Valid.
  • $urls[] = 'http://data.gov.ua/'; Valid.

Advice. Use // if you do not know if it is http or https.

By the way, you can simplify your code: p

 function clean_url(array $urls) { $good_url = []; foreach( $urls as $url ) { // add a chech on the start of the url. $parse = parse_url($url); if( isset($url['host']) ) array_push($good_url, $url['host']); else $good_url[] = 'Invalid Url'; // for example, or triger error. } return $good_url; } 

See foreach and isset

0
source

Common URL format:

 scheme://hostname:port/path?query#fragment 

Each part of the URL is optional, and it uses delimiters between them to determine which parts have been provided or omitted.

The hostname is part of the URL after the // prefix. Many of your URLs skip this prefix, so they don’t have a host name.

For example, parse_url('data.gov.ua/') returns:

 Array ( [path] => data.gov.ua/ ) 

To get what you want, it must be parse_url('//data.gov.ua/') :

 Array ( [host] => data.gov.ua [path] => / ) 

This often confuses programmers, because browsers are very forgiving about entering incomplete URLs in the location field, they have a heuristic to try to decide if something is the host name or path. But APIs like parse_url() are more strict.

0
source

I made this simple function that gives me url (for name) and full url (for hrefs)

 public static function parseUrl($target_url) { $url = ""; $url_full = ""; if (!empty($target_url)) { $parser = @parse_url($target_url); if (!empty($parser['host'])) { $url = $parser['host']; if (!empty($parser['scheme'])) { $url_full = $parser['scheme'] . "://" . $parser['host']; } else { $url_full = "//" . $parser['host']; } } else { if (!empty($parser['path'])) { return self::parseUrl("//".$parser['path']); } } } return array('url' => $url, 'url_full' => $url_full); } 

which fits well with the example

 Array ( [url] => en.wikipedia.org [url_full] => https://en.wikipedia.org ) Array ( [url] => data.gov.ua [url_full] => //data.gov.ua ) Array ( [url] => e-data.gov.ua [url_full] => //e-data.gov.ua ) Array ( [url] => e-data.gov.ua [url_full] => //e-data.gov.ua ) Array ( [url] => api.jquery.com [url_full] => https://api.jquery.com ) Array ( [url] => api.jquery.com [url_full] => https://api.jquery.com ) Array ( [url] => searchdatamanagement.techtarget.com [url_full] => //searchdatamanagement.techtarget.com ) Array ( [url] => www.businessdictionary.com [url_full] => //www.businessdictionary.com ) Array ( [url] => data.world [url_full] => https://data.world ) Array ( [url] => en.oxforddictionaries.com [url_full] => https://en.oxforddictionaries.com ) 

So you can use:

 <a href="{$url['url_full']}" target="_blank">{$url['url']}</a> 
0
source


Some time ago, I developed a solution to a similar problem.
I made some changes to my original code to fit your specification.
It is functional, but not very elegant.

 function clean_url($urls) { $good_url=array(); for ($i=0;$i<count($urls);$i++){ $domain=$urls[$i]; $domain = str_replace("www.","",$domain); $domain = str_replace("https://","",$domain); $domain = str_replace("http://","",$domain); $domain=explode("/", $domain); array_push($good_url, $domain[0]); } return $good_url; } $urls=array( "0" => "https://en.wikipedia.org/wiki/Data" , "1" => "data.gov.ua/" , "2" => "e-data.gov.ua/", "3" => "e-data.gov.ua/transaction", "4" => "https://api.jquery.com/data/", "5" => "https://api.jquery.com/jquery.data/" , "6" => "searchdatamanagement.techtarget.com/definition/data" , "7" => "www.businessdictionary.com/definition/data.html" , "8" => "https://data.world/", "9" => "https://en.oxforddictionaries.com/definition/data"); echo "<pre>"; print_r(clean_url($urls)); echo "</pre>"; 

Yours faithfully,

-one
source

This was the wrong http scheme. I have added http: // to all urls and it works

-one
source

Source: https://habr.com/ru/post/1013481/


All Articles