What does this regular expression mean?

/\ATo\:\s+(.*)/ 

Also, how do you decide this, which approach?

+4
source share
6 answers

You start on the left and look at any escaped characters (i.e. \A ). The rest are ordinary characters. \A means start of input. Thus, To: must be consistent at the very beginning of the input. I think : hiding for nothing. \s is a group of characters for all spaces (tabs, spaces, possibly newlines) and + that follows it means that you must have one or more spaces. After that, you capture the rest of the line in the group (labeled ( ) ).

If the entrance was

 To: progo@home 

the capture group will contain " progo@home "

0
source

In multi-line regular expressions, \A corresponds to the beginning of a line (a \Z represents the end of a line, and ^ / $ corresponds to the beginning / end of a line or the beginning / end of a line). In single-line options, you simply use ^ and $ to start and end the line / line, as there is no difference.

To is a literal, \: is escaped :

\s means a space, and + means one or more of the previous "characters" (space in this case).

() is a capture group, that is, everything here will be stored in a β€œregister” that you can use. Therefore, it is the meat to be extracted.

.* just means any newline character . , zero or more times * .

So, what this regular expression will do, you need to process a string like:

 To: paxdiablo Re: you are so cool! 

and return the text paxdiablo .

Regarding how to learn how to do it yourself, the Perl regex tutorial (a) is a good start, and then practice, practice, practice :-)


(a) In fact, you did not specify which version of the regular expression you are using, but the most modern ones are very similar to Perl. If you can find a specific tutorial for your particular fragrance, this will obviously be better.

+4
source

\A is a zero-width statement and means "Match only at the beginning of a line."

Regular expression: In a line starting with "To:" followed by one or more spaces ( \s ), write the rest of the line ( (.*) ).

+2
source

First, you need to know what the various character classes and quantifiers are. Character classes are characters with a backslash character, \A from your regular expression, for example. Quantifiers are, for example, + . There are several links on the Internet, for example this one .

Using this, we can see what happens going left-right:

  • \A matches the beginning of a line.
  • To literally matches the letter "To"
  • \: supersedes ":", so it loses special meaning and becomes "just a colon"
  • \s matches a space (space, tab, etc.)
  • + means matches the previous class one or more times, so \s+ means one or more spaces
  • () - capture group, everything that is agreed within parens is saved for later use.
  • . means "any character"
  • * similar to + , but zero or more times, therefore .* means any number of any characters

Taking this together, the regex will match the line starting with "To:", then at least one space and all that it saves. So, with the line "To: JaneKealum" you can extract the "JaneKealum".

+1
source

It matches To: at the beginning of input, followed by at least one space, followed by any number of characters in a group.

0
source

The start and end characters / limit the regular expression.

A \ inside an expression means referring to the next character or referring to it as a literal, if it usually has a special meaning.

\A means match only at the beginning of a line.

To means matching the literal "To"

\: means match the literal ':'. The colon is usually literal and has no special meaning that can be given.

\s means a space character matches.

+ means a match as much as possible, but at least one of what it follows, so \s+ means a match with one or more whitespace characters.

( i ) define a group of characters to be displayed and returned by the expression evaluator.

And finally . matches any character, and * means match as much as possible, but may be zero. Therefore (.*) Will write all characters to the end of the input line.

Thus, the pattern will correspond to the line that launches "To:" and captures all characters that appear after the first subsequent character without spaces.

The only way to understand these things is to go through them one bit at a time and check the value of each component.

0
source

Source: https://habr.com/ru/post/1368917/


All Articles