What is the actual replacement for the regular expression "$ content = ~ s / \ n - \ n. *? $ // s"?

Question

What is the actual replacement for the regular expression "$ content = ~ s / \ n - \ n. *? $ // s"?

I am working through some Perl code in Track Tracker 4.0 and have encountered an error when the request request message is interrupted. I'm new to Perl, I have done some work with regular expressions, but I have some problems with this, even after reading quite a bit.

I narrowed my problem down to this line of code:

$content =~ s/\n-- \n.*?$//s

I do not quite understand what he is doing, and would like a more detailed explanation.

I understand that s/ / matches the pattern \n-- \n.*?$ And does not replace it with anything.

I do not understand what does .*?$ . Here is my basic understanding:

. - any character except \ n
* - 0 or more times of the previous character
? - 0 or 1 time of the previous character
$ is the end of the line

Then, as I understand it, the final s does . match newlines

So, roughly speaking, we replace any text starting with \n-- \n - this line of code causes some dubious behavior that I would like to understand if someone can explain what is happening here.

Can someone explain what this line is doing? Is it just deleting all the text after the first \n-- \n or is there anything else?

Long term part / real problem (you do not need to read this to answer the question)

My specific problem is that it cuts the quoted content of the signature.

So if email A from the client says:

What happens with an ABCD order?
- Some customers

Staff response says (note the loss of customer signatures)

Shipping today
What happens with an ABCD order?

Customer is responsible

I did not understand, he did not send !!!
- Some customers
Shipping today
What happens with an ABCD order?

When we reply, their message will be clipped to -, which kills the entire context.

It is shipped today, tracking number 12345
I did not understand, he did not send !!!

And it leads to more work, explaining what kind of order, etc.

+6

regex perl rt

candyman Aug 7 '13 at 19:56

source share

3 answers

When ? follows the quantifier ( ? , * , + or {m,n} ), it changes the greed of this quantifier ^[1] . Usually these quantifiers correspond to the maximum possible number of characters, but with ? they correspond to the smallest number.

 say "Greedy: ", "abc1234" =~ /\w(.*)\d/; say "Non-greedy: ", "abc1234" =~ /\w(.*?)\d/;

Output:

 bc123 bc

Since the two $ places may coincide (before the ending new line or at the end of the line), this has the following effect:

 $_ = "abc\n-- \ndef\n"; say "Greedy: <<" . s/\n-- \n.*$//sr . ">>"; say "Non-greedy: <<" . s/\n-- \n.*?$//sr . ">>";

Output:

 Greedy: <<abc>> Non-greedy: <<abc >>

This ensures that the new line ending the last line is not deleted. The following are simpler equivalents:

 s/\n-- \n.*/\n/s s/(?<=\n)-- \n.*//s # Slow s/\n\K-- \n.*//s # Requires 5.10

Please note that it will be deleted from the first -- .

 $ perl -E'say "abc\n-- \ndef\n-- \nghi\n" =~ s/\n-- \n.*?$//sr' abc

If you want to start the removal from the latter, you need to replace .* With something guaranteed that does not match -- .

 $ perl -E'say "abc\n-- \ndef\n-- \nghi\n" =~ s/\n-- \n(?:(?!-- \n).)*?$//sr' abc -- def

Notes:

It also has the same meaning if it follows another quantifier modifier (for example, /.*+?/ ).

+2

ikegami Aug 7 '13 at 20:54

source share

There is a good CPAN module that will help you understand regular expressions in the future: YAPE :: Regex :: Explain

You can find its online version here: http://rick.measham.id.au/paste/explain.pl

Running a regular expression through a website returns the following:

 NODE EXPLANATION -------------------------------------------------------------------------------- \n '\n' (newline) -------------------------------------------------------------------------------- -- '-- ' -------------------------------------------------------------------------------- \n '\n' (newline) -------------------------------------------------------------------------------- .*? any character except \n (0 or more times (matching the least amount possible)) -------------------------------------------------------------------------------- $ before an optional \n, and the end of the string

According to the docs, “There is no support for regular expression syntax added after Perl version 5.6, especially any constructs added in 5.10,” but in practice you can still use it to help understand most of the regular expressions you come across.

+1

dms Aug 7 '13 at 21:25

source share

Moritz bunkus · Accepted Answer · 2013-08-07T20:01:15+0000

You are almost right: it removes everything from the last occurrence of "\ n - \ n" to the end. That this does not eliminate everything from the first appearance is connected with the non-greed operator ? - it tells the regex engine the match with the shortest form entry of the previous template ( .* ).

What this means: in an e-mail message, the signature is usually separated from the message body by this particular template: a line consisting of exactly two dashes and one end space. Therefore, what the regular expression does is remove everything from the signature delimiter to the end.

Now, what your client does (manually or his email client) adds the quoted email response after the signature separator. This is very unusual: the quoted answer must be found before the signature modifier. I don’t know any email client that does this on purpose, but, alas, there are many programs that simply receive email (due to encoding problems when quoting to an SMTP mismatch, you can make an incredible amount of errors), so I don’t I’m surprised to learn that there really are such clients.

Another possibility is that this affects the client - for example, sign your name after -- . However, I suspect that this is not done manually, as people rarely insert the ending space after two dashes, followed by a line break.

What is the actual replacement for the regular expression "$ content = ~ s / \ n - \ n. *? $ // s"?

More articles: