Regex to find all possible occurrences of text starting and ending with ~

I would like to find all possible occurrences of the text enclosed between the two ~ s.

For example: for text ~*_abc~xyz~ ~123~ I want the following expressions to match patterns:

  • ~*_abc~
  • ~xyz~
  • ~123~

Please note that this can be an alphabet or a number.

I tried with the regex ~[\w]+?~ , But it doesn't give me ~xyz~ . I want ~ be reviewed. But I don’t need just ~~ as a possible coincidence.

+5
source share
2 answers

Use capture inside a positive view with the following regex:

Sometimes you need several matches within the same word. For example, suppose you want to extract ABCD , BCD , CD and D from a string such as ABCD . You can do this with this single regex:

(?=(\w+))

In the first position in the line (up to A ), the engine starts the first attempt at a match. The glance claims that what immediately follows the current position is one or more word characters and captures these characters in group 1. The result looks successful, and an attempt to match is also made. Since the pattern did not match any actual character (only the look looks), the engine returns a zero-width match (empty string). It also returns what was captured by Group 1: ABCD

Then the engine moves to the next position in the line and starts the next attempt to match. Again, lookahead claims that what immediately follows this position is a word character and captures these characters in group 1. The match succeeds, and group 1 contains a BCD .

The engine moves to the next position in the line, and the process repeats for CD , then D

So use

 (?=(~[^\s~]+~)) 

Watch the regex demo

The pattern (?=(~[^\s~]+~)) checks each position within the string and looks for ~ followed by 1 + characters except spaces and ~ , and then followed by another ~ . Since the index moves only after the position is checked, and not when the value is captured, overlapping substrings are retrieved.

Python demo :

 import re p = re.compile(r'(?=(~[^\s~]+~))') test_str = " ~*_abc~xyz~ ~123~" print(p.findall(test_str)) # => ['~*_abc~', '~xyz~', '~123~'] 
+5
source

Try [^~\s]*

This pattern does not include the characters ~ and space (denoted by \s ).

I tested it, it works on your line, here is a demo .

-1
source

Source: https://habr.com/ru/post/1246145/


All Articles