Google Analytics Regular Expression - Alternative to No Negative Forecast

In its filters, Google Analytics no longer allows negative viewing. It is very difficult to create my own report, including links that I would like to include.

A regular expression that includes a negative result that will work if it was included:

test.com(\/\??index\_(.*)\.php\??(.*)|\/\?(.*)|\/|)+(\s)*(?!.) 

It corresponds:

 test.com test.com/ test.com/index_fb2.php test.com/index_fb2.php?ref=23 test.com/index_fb2.php?ref=23&e=35 test.com/?ref=23 test.com/?ref=23&e=35 

and does not match (as follows):

 test.com/ambassadors test.com/admin/?signup=true test.com/randomtext/ 

I want to learn how to adapt my regex to still have the same matches, but without using a negative look.

Thanks!

+4
source share
2 answers

Google Analytics doesn't seem to support single-line and multi-line modes, which makes sense to me. URLs cannot contain newlines, so it doesn't matter if the dot matches them, and you never need ^ and $ match anywhere except at the beginning and end of the entire line.

This means that (?!.) In your regular expression is exactly equivalent to $ , which matches only at the very end of the line (for example, \z , in flavors that support it). Since this is the only look in your regular expression, you should never have had this problem; you should have used $ all the time.

However, your regex has other problems, mainly due to over dependency on (.*) . For example, it matches these lines:

 test.com/?^#(%)!*%supercalifragilisticexpialidocious test.com/index_ecky-ecky-ecky-ecky-PTANG!-vroop-boing_rowr.php (ni! shh!) 

... which I am sure you do not want .: P

Try this regex:

 test\.com(?:/(?:index_\w+\.php)?(?:\?ref=\d+(?:&e=\d+)?)?)?\s*$ 

or more readable:

 test\.com (?: / (?:index_\w+\.php)? (?: \?ref=\d+ (?: &e=\d+ )? )? )? \s*$ 

For purposes of illustration, I make many simplifying assumptions about (for example) what parameters may be present, in what order they will appear and what their values ​​may be. I am also wondering if you really need to match the domain ( test.com ). I have no experience with Google Analytics, but should not start the match (and be tied) immediately after the domain? And do you really need to resolve the spaces at the end? It seems to me that the regex should be something like this:

 ^/(?:index_\w+\.php)?(?:\?ref=\d+(?:&e=\d+)?)?$ 
+3
source

Firstly, I think your regex needs some correction. See what you have:

 test.com(\/\??index_.*.php\??(.*)|\/\?(.*)|\/|)+(\s)*(?!.) 

In the case when you use optional ? at the beginning of index... , the second option is already taken care of:

 test.com(\/index_.*.php\??(.*)|\/\?(.*)|\/|)+(\s)*(?!.) 

Now you probably want the first (.*) Resolved if it was literal before ? . Otherwise, you will comply with test.com/index_fb2.phpanystringhereandyouprobablydon'twantthat . Therefore, move the corresponding optional marker:

 test.com(\/index_.*.php(\?(.*))?|\/\?(.*)|\/|)+(\s)*(?!.) 

Now .* Consumes any character and as much as possible. In addition,. before php consumes any character. This means that you allow both test.com/index_fb2php and test.com/index_fb2.html?someparam=php . Let it be a letter . and only allow unsigned characters:

 test.com(\/index_[^?]*\.php(\?(.*))?|\/\?(.*)|\/|)+(\s)*(?!.) 

Now the first, second and third parameters can be collapsed into one if we also make the file name optional:

 test.com(\/(index_[^?]*\.php)?(\?(.*))?|)+(\s)*(?!.) 

Finally, + can be removed, since inside (.*) Inside you can already take care of all possible repetitions. Also (something|) matches (something)? :

 test.com(\/(index_[^?]*\.php)?(\?(.*))?)?(\s)*(?!.) 

Seeing your input examples, it looks like you really want to match.

Then answer your question. What (?!.) Depends on whether you use singleline mode or not. If you do, he claims that you have reached the end of the line. In this case, you can simply replace it with \Z , which always matches the end of the line. If you do not, then he claims that you have reached the end of the line. In this case, you can use $ , but you also need to use multi-line mode, so $ also matches the ends of the lines.

So, if you use singleline mode (which probably means that you only have one URL per line), use this:

 test.com(\/(index_[^?]*\.php)?(\?(.*))?)?(\s)*\Z 

If you are not using singleline mode (which probably means you can have multiple URLs in your own lines), you should also use multiline mode and this kind of anchor instead:

 test.com(\/(index_[^?]*\.php)?(\?(.*))?)?(\s)*$ 
+1
source

Source: https://habr.com/ru/post/1445768/


All Articles