Parsing and simplifying URLs in PHP

I am parsing links found on web pages, and I am looking for a way to convert URLs like this:

http://www.site.com/./eng/.././disclaimer/index.htm

equivalent and correct

http://www.site.com/disclaimer/index.htm

mainly to avoid duplicates.

Thank.

+3
source share
2 answers

like this

function simplify($path) {
   $r = array();
   foreach(explode('/', $path) as $p) {
      if($p == '..')
        array_pop($r);
      else if($p != '.' && strlen($p))
        $r[] = $p;
   }
   $r = implode('/', $r);
   if($path[0] == '/') $r = "/$r";
   return $r;
}

and this is how you use it

$u = parse_url($dirtyUrl);
$u['path'] = simplify($u['path']);
$clean_url = "{$u['scheme']}://{$u['host']}{$u['path']}";
+3
source

What makes you think that these two URLs: s are equivalent?

If you can answer this question in detail, use a regular expression or parser to adhere to rules that, as you know, indicate that pages are equivalent.

0
source

Source: https://habr.com/ru/post/1731851/


All Articles