Preg_match_all () behaves differently on different servers

The code below works fine on XAMPP on my PC, but doesn't work on my recently purchased VPS. He broke my code.

preg_match_all( "/$regex/siU" , $string , $matches , PREG_SET_ORDER ); 

It is expected that you simply get links and headings from HTML.

Earlier, a similar problem with regular expression arose today. The code worked fine on the local server, but generated a "Connection Was Reset" error on vps. The problem was caused by commented html (with php code inside it), which was removed using the code below to optimize the output, but the reset connection problem is resolved, HTML still has comments in the browser source.

 $string = preg_replace( '/<!--(.|\s)*?-->/' , '' , $string ); 

So the problem is clear. These regular expressions are not working fine. But I do not know the solution.

Can anyone help me in resolving this issue.

It is decided:

Thanks to https://stackoverflow.com/a/4148773

+4
source share
5 answers

It is known that PCRE sometimes has several problems with text larger than 200 lines. Developers from Drupal and GeSHi have been affected by this issue in the past.

Literature:

Maybe if you can split the text into small pieces (for example, 100 lines) and run a regular expression on each fragment, it can help.

+2
source

Let me stop you there for a second. Parsing HTML with regular expressions is a bad idea, unless it is a very isolated problem in an incorrect document. You will want to use the correct parser; for example, here is an example that shares HTML comments:

 $html = <<<EOM <html> <body> <div id="test"> <!-- comment here --> </div> </body> </html> EOM; $d = new DOMDocument; $d->loadHTML($html); $x = new DOMXPath($d); foreach ($x->query('//comment()') as $node) { $node->parentNode->removeChild($node); } echo $d->saveHTML(); 
+1
source

So, the root problem is that the code that should remove the HTML comments does not work? Probably because the regular expression, which should match the comments, uses (.|\s)* to get around the fact that . does not match newlines. This is almost guaranteed to cause problems, as this answer explains.

The correct way to match against anything, including newlines, is to use the s modifier. For instance:

 '/<!--.*?-->/s' 

This includes single line mode (also known as DOTALL mode), which allows . match newlines. (The author of this other question had to use [\S\s] instead, because JavaScript has no equivalent for single-line mode / DOTALL.)

+1
source

The problem seems to be that you misunderstand what the html comments do. According to your comment below your question, the problem is that the html comments were not removed, which caused php to work with the wrong parameters.

However, the html comments do not affect the PHP code that is executed or does not run, only what the browser displays (and works in the case of javascript). Your php code runs before exiting to the browser.

If you want to comment on the php code, you will need to put /* */ in the block or run each line with // .

0
source

Try the following:

 $string = preg_replace( '/.*<!--(.|\s)*?-->.*/' , '' , $string ); 

Some regex implementations will execute your regex as follows: /^<!--(.|\s)*?-->$/ . Thus, your expression may behave differently on different servers.

-1
source

Source: https://habr.com/ru/post/1244457/


All Articles