Regex (C #) - how to combine variable names starting with a colon

I need to distinguish between variable names and variable names in some expressions that I am trying to parse. Variable names begin with a colon, can have (but not begin with) numbers, and have underscores. Thus, valid variable names are:

:x :_x :x2 :alpha_x // etc 

Then I need to highlight other words in an expression that does not start with a colon. So in the following expression:

 :result = median(:x,:y,:z) 

Variables will be: result ,: x ,: y and: z, while another non-variable word will be median.

My regex for choosing variable names (this works):

 :[a-zA-Z_]{1}[a-zA-Z0-9_]* 

But I can’t understand how to get invariable words. My regex for this is:

 (?<!:)([a-zA-Z_]{1}[a-zA-Z0-9_]*) 

The problem is that the match only excludes the first character after: like this:

enter image description here

+5
source share
2 answers

The regular expression (?<!:)([a-zA-Z_]{1}[a-zA-Z0-9_]*) still matches partial variable words because (?<!:) says no : left of the current location, and then matches the identifier without checking the word boundary. So, in :alpha , lpha maps because l precedes char except :

Therefore, the problem can be easily solved by adding the word boundary to [a-zA-Z_] :

 var words = Regex.Matches(s, @"(?<!:)\b[a-zA-Z_]\w*", RegexOptions.ECMAScript) .Cast<Match>() .Select(x => x.Value) .ToList(); 

See the demo of regex . Note that you do not need to wrap the entire template with a capture group.

Template Details

  • (?<!:) - make sure that there are no places near the current location :
  • \b - word boundary: make sure that there are no letters, numbers or _ immediately to the left of the current location.
  • [a-zA-Z_] - matches the letter ASCII or _
  • \w* - 0 + ASCII letters, numbers or _ ( should be used with the ECMAScript option to match ASCII letters and numbers and only make an ASCII word descriptor)
+1
source

The following pattern seems to work:

 (?<=[^A-Za-z0-9_:])[a-zA-Z_]{1}[a-zA-Z0-9_]* 

Lookbehind (?<=[^A-Za-z0-9_:]) states that the preceding character is not the character allowed in the variable name or colon. This would mean the beginning of an unchangeable word.

Demo

+1
source

Source: https://habr.com/ru/post/1275772/


All Articles