Finding a match between optional tokens?

For strings:

  • text::handle: e@ma.il ::text
  • text::chat_identifier:chat0123456789&text

I have a current regex:

m/(handle:|chat_identifier:)(.+?)(:{2}|&)/

And I'm currently using $2 to get the desired value (in the first line e@ma.il and in the second, chat0123456789 ).

Is there a better / faster / easier way to solve this problem?

+4
source share
4 answers

Whether this is β€œbetter” or not depends on the context, but you can use this approach: divide the line by β€œ:” and take the fourth element of the resulting list. This is probably more readable than a regular expression and more reliable if the third field can be anything other than "handle" or "chat_identifier".

I think the speed will be very similar for any approach, but probably for almost any implementation in perl. I want to show that speed was critical for this step before worrying about it ...

+4
source

For a regex solution, this is a bit simpler and requires no return:

 m/(handle|chat_identifier):([^:&]+)/ 

Note the small difference: yours allows you to use single colons within the value, I don't (it stops on the first collision). If this is not a problem, you can use my option. Or, as I mentioned in the comment, divide by : and use the fourth element in the result.

The equivalent version, which stops only with double colons, is this:

 m/(handle|chat_identifier):((?:(?!::|&).)+)/ 

Not so pretty, but he still avoids going back (looking might slow things down though ... you will need to profile this if speed is at all important).

+2
source

It looks like you have good solutions already here. The splitting method seems the simplest. But depending on your requirements, you can also use a more general regular expression that breaks the string into its main parts. It will work for other data types and property names than in your examples.

  ([^:]+)::([^:]+):([^:&]+)(?:::|&)\1 

Capture groups are as follows:

  • Group 1: data type. (the keyword "text" from your examples.)
  • Group 2: property name. (The keywords "handle" and "chat_identifier" from your examples.)
  • Group 3: property value.
+1
source

If the required values ​​are always in the same position and safely divided by : and & , then perhaps the following will work for you:

 use Modern::Perl; say +( split /[:&]+/ )[2] for <DATA>; __DATA__ text::handle: e@ma.il ::text text::chat_identifier:chat0123456789&text 

Output:

 e@ma.il chat0123456789 
+1
source

Source: https://habr.com/ru/post/1447620/


All Articles