Preg_replace all links in file_get_contents not containing words

Question

Preg_replace all links in file_get_contents not containing words

I am reading a page in a variable, and I would like to disable all links that do not contain the word "means" in the address. The code that I still capture all the links, including those that have a "tool." What am I doing wrong?

$page = preg_replace('~<a href=".*?(?!remedy).*?".*?>(.*?)</a>~i', '<font color="#808080">$1</font>', $page);

- decision -

 $page = preg_replace('~<a href="(.(?!remedy))*?".*?>(.*?)</a>~i', '<font color="#808080">$2</font>', $page);

+4

php regex preg-replace negative-lookahead

user2001487 May 12, '13 at 21:25

source share

2 answers

I would probably use this:

 <a href="(?:(?!remedy)[^"])*"[^>]*>([^<]*)</a>

The most interesting part:

 "(?:(?!remedy)[^"])*"

Each time [^"] is about to consume a different character, it returns to the view, so it confirms that this is not the first character of the word remedy . Use [^"] instead . doesn't let him look at anything other than a closing quote. I also took the liberty of replacing yours .*? to negative character classes. This serves the same purpose as “corralled” matching in the area where you want it matching. It is also more efficient and more sustainable.

Of course, I assume that the content of the <a> element is plain text, which does not contain more elements inside it. In fact, this is just one of many simplifying assumptions I have made. You cannot match HTML with regular expressions without them.

0

Alan moore May 12, '13 at 23:27

source share

Matmarbon · Accepted Answer · 2013-05-12T21:50:56+0000

Try ~<a href="(.(?!remedy))*?".*?>(.*?)</a>~i

To the question of what you are doing wrong: regular matches are always, if possible, and for each URL (even with remedy ) you can match '~<a href=".*?(?!remedy).*?".*?>(.*?)</a>~i' , because you didn’t indicate remedy , possibly not contained anywhere in the attribute, but you indicated that there should be nothing / nothing ( .*? ), which is not follows remedy , and this applies to any URL except those starting with exactly <a href="remedy" . I hope you can understand that ...

Preg_replace all links in file_get_contents not containing words

More articles: