Why does my regex remove spaces?

$str = "& &svnips   Â ∴ ≈ osidnviosd & sopinsdo";   
$regex = "/&[^\w;]/";
echo preg_replace($regex, "&", $str);

I am trying to replace all non-encoded ampersands with encoded ones.
The problem is that it removes the gap between &and sopinsdo.

Any idea why?

+3
source share
4 answers

Why use a regex? Why not use htmlspecialchars()?

echo htmlspecialchars($str, ENT_NOQUOTES, 'UTF-8', false);

Pay attention to the fourth parameter. This means not to recode anything. Basically it will turn everything <into &lt;, everything >into &gt;and everything &that is not part of an existing object in&amp;

But, if you must use a regex, you can do:

$regex = '/&([^\w;])/';
echo preg_replace($regex, '&amp;\1', $str);

, -, ...

+2

2 ( "&" NOT (; \w)) &amp;

&amp; ( )

+2

This regex does what you are looking for.

preg_replace('/&(?!\w+;)/', '&amp;', $text);

So, for a few simple test cases, you can get properly escaped HTML:

'& sopinsdo'          -> '&amp; sopinsdo'
'&amp; sopinsdo'      -> '&amp; sopinsdo'
'sopinsdo & foo; bar' -> 'sopinsdo &amp; foo; bar'
'sopinsdo &foo bar'   -> 'sopinsdo &amp;foo bar'
+1
source

This way you do not want the space between & and sopinsdo removed. Just add one

echo preg_replace($regex, "&amp; ", $str);
0
source

Source: https://habr.com/ru/post/1759525/