The regex * this close * but one character type is still missing

Believe that this problem has been resolved ... for more details see the note Edit 2.

This is a fairly simple regex problem (I thought) that I suspect that someone more experienced in them can probably solve it quite easily.

I have the following expression:

(?<token>((?<!(\.\d*))'[^']*'(?=[ ,])|(?<!(\.\d*|'))[-+]?\d*\.?\d+(?!(\.|'))))

The following is a test line:

34, 12., 'test', 106, 53, 'noon' ,'lunch' ,0.5,6, 8, .87 ,'foo', 'bar', 1253 ,'baz'.3, 1.2.3, .3'foo', 124`, 12.

The purpose of the regular expression is really simple - to parse a list of strings consisting of elements that are either strings enclosed in single quotes or numbers. Neither the line type, nor the prefix or suffix is ​​different. A comma or space is a sufficient separator between tokens. More formally, we can say:

<token-string> :== <token-string>,<token> | <token>
<token>        :== <quoted-string> | <number>
<quoted-string>:== 'a-zA-Z0-9' 
<number>       :== (floating point number)

, . 99%, : , (, 12. ) - , , point fail - , , "1" "12.", lookahead.

, , , . , , , , - , . .

, "baz'.3, 1.2.3 .3'foo" . , 34, 106, 53 .., .

2 , . \d () , "12." . , , . :

(?<token>((?<!(\.\d*))'[^']*'(?=[ ,])|(?<!(\.\d*|'))[-+]?\d*\.?\d+(?!(\.\d|'))))
+4
2

Capture Collections, .

 #  @"(?s)(?:(?:(?:^|,))\s*(?<token>(?:'(?:[^'\\]*(?:\\.[^'\\]*)*)'|(?=[^eE]*\d)[+-]?\d*\.?\d*(?:[eE][+-]?\d+)?))\s*(?:(?=,)|$)|.)+"

 (?s)                          # Dot-All
 (?:
      (?:                           # Consumed leading comma
           (?: ^ | , )
      )
      \s*                           # Trim leading wsp
      (?<token>                     # (1), Capture Collection 'token                                                                                                                                                                                                               
           (?:
                '                             # Single quoted string data
                (?:
                     [^'\\]* 
                     (?: \\ . [^'\\]* )*
                )
                '
             |                              # OR
                                              # Numeric form ( with bonus exponent )
                (?= [^eE]* \d )               # Lookahead must be a digit (and before exponent)
                [+-]? \d* \.? \d*             # Consume correct numeric form 
                (?: [eE] [+-]? \d+ )?         # Consume correct exponent form
           )
      )
      \s*                           # Trim trailing wsp 
      (?:                           # lookahead trailing comma
           (?= , )
        |  $ 
      )
   |  
      .                             # This character does not conform to token spec, just consume it
 )+

#

 string strAll = @"34, 12., 'test', 106, 53, 'noon' ,'lunch' ,0.5,6, 8, .87 ,'foo', 'bar', 1253 ,'baz'.3, 1.2.3, .3'foo', 124`, 12.";
 string Allpattern = @"(?s)(?:(?:(?:^|,))\s*(?<token>(?:'(?:[^'\\]*(?:\\.[^'\\]*)*)'|(?=[^eE]*\d)[+-]?\d*\.?\d*(?:[eE][+-]?\d+)?))\s*(?:(?=,)|$)|.)+";
 Match Allmatch = Regex.Match(strAll, Allpattern);
 if ( Allmatch.Success)
 {
     Console.WriteLine("Tokens:");
     for (int ctr = 0; ctr < Allmatch.Groups["token"].Captures.Count; ctr++)
         Console.WriteLine( Allmatch.Groups["token"].Captures[ctr].Value );
 }

 Tokens:
 34
 12.
 'test'
 106
 53
 'noon'
 'lunch'
 0.5
 6
 8
 .87
 'foo'
 'bar'
 1253
 12.
+2

?

(?:^|,| )('\w+'|\d*\.?\d+)(?:,| |$)

Regular expression visualization

Debuggex

1.

, (, 12.). \d+ \d* , .

, debuggex, # , .

+1

Source: https://habr.com/ru/post/1532931/


All Articles