I do not know if I understood your question correctly, if you want to deal with all text sequences enclosed in src=" and " , the following template could do this:
~(\ssrc=")([^"]+)(")~
It has three capture groups, the second of which contains the data you are interested in. The former and the latter are useful for changing the entire correspondence.
Now you can replace all instances with a callback function that changes places. I created a simple line with all 6 cases that you have:
$site = <<<BUFFER 1. src="//www.stackoverflow.com/cat.png" 2. src="http://www.stackoverflow.com/cat.png" 3. src="https://www.stackoverflow.com/cat.png" 4. src="somedirectory/cat.png" 5. src="/cat.png" 6. src="cat.png" BUFFER;
Let it ignore the absence of surrounding HTML tags at some point, you donโt understand HTML anyway, Iโm sure, because you didnโt ask for an HTML parser, but for a regular expression. In the following example, a match in the middle (URL) will be enclosed so that it clears it:
So now, to replace each of the links, you can easily start by simply highlighting them in the line.
$pattern = '~(\ssrc=")([^"]+)(")~'; echo preg_replace_callback($pattern, function ($matches) { return $matches[1] . ">>>" . $matches[2] . "<<<" . $matches[3]; }, $site);
The output for the given example:
1. src=">>>//www.stackoverflow.com/cat.png<<<" 2. src=">>>http://www.stackoverflow.com/cat.png<<<" 3. src=">>>https://www.stackoverflow.com/cat.png<<<" 4. src=">>>somedirectory/cat.png<<<" 5. src=">>>/cat.png<<<" 6. src=">>>cat.png<<<"
Since the way to replace the string must be changed, it can be extracted, so it is easier to change:
$callback = function($method) { return function ($matches) use ($method) { return $matches[1] . $method($matches[2]) . $matches[3]; }; };
This function creates a replace callback based on the method of replacing your password as a parameter.
Such a replacement function may be:
$highlight = function($string) { return ">>>$string<<<"; };
And it is called as follows:
$pattern = '~(\ssrc=")([^"]+)(")~'; echo preg_replace_callback($pattern, $callback($highlight), $site);
The output remains the same, it was just to illustrate how mining works:
1. src=">>>//www.stackoverflow.com/cat.png<<<" 2. src=">>>http://www.stackoverflow.com/cat.png<<<" 3. src=">>>https://www.stackoverflow.com/cat.png<<<" 4. src=">>>somedirectory/cat.png<<<" 5. src=">>>/cat.png<<<" 6. src=">>>cat.png<<<"
The advantage of this is that for the replacement function, you only need to deal with the match of the URL as a single string, and not with the regular expression matching the array for different groups.
Now to your second half of your question: how to replace this with specific URL handling, such as deleting a file name. This can be done by analyzing the URL itself and removing the file name (basename) from the path component. Thanks to the extraction, you can make this a simple function:
$removeFilename = function ($url) { $url = new Net_URL2($url); $base = basename($path = $url->getPath()); $url->setPath(substr($path, 0, -strlen($base))); return $url; };
This code uses the Pear Net_URL2 URL component (also available through Packagist and Github, may also have their own OS packages). It can easily parse and modify URLs, so itโs nice to have a job.
So now the replacement is done with the new URL file name replacement function:
$pattern = '~(\ssrc=")([^"]+)(")~'; echo preg_replace_callback($pattern, $callback($removeFilename), $site);
And then the result:
1. src="//www.stackoverflow.com/" 2. src="http://www.stackoverflow.com/" 3. src="https://www.stackoverflow.com/" 4. src="somedirectory/" 5. src="/" 6. src=""
Please note that this is an example. It shows how you can do this with regular expressions. However, you can also use an HTML parser. Let this be the actual HTML snippet:
1. <img src="//www.stackoverflow.com/cat.png"/> 2. <img src="http://www.stackoverflow.com/cat.png"/> 3. <img src="https://www.stackoverflow.com/cat.png"/> 4. <img src="somedirectory/cat.png"/> 5. <img src="/cat.png"/> 6. <img src="cat.png"/>
Then process all the <img> " src " attributes using the created plug-in filter function:
$doc = new DOMDocument(); $saved = libxml_use_internal_errors(true); $doc->loadHTML($site, LIBXML_HTML_NOIMPLIED | LIBXML_HTML_NODEFDTD); libxml_use_internal_errors($saved); $srcs = (new DOMXPath($doc))->query('//img/@hsrc') ?: []; foreach ($srcs as $src) { $src->nodeValue = $removeFilename($src->nodeValue); } echo $doc->saveHTML();
The result will again be:
1. <img src="//www.stackoverflow.com/cat.png"> 2. <img src="http://www.stackoverflow.com/cat.png"> 3. <img src="https://www.stackoverflow.com/cat.png"> 4. <img src="somedirectory/cat.png"> 5. <img src="/cat.png"> 6. <img src="cat.png">
Another method of parsing is used - the replacement is still the same. Just to offer two different ways that also partially overlap.