RegEx to capture duplicate fields?

I have a buffer that I am trying to parse using regular expressions.

Here is an example buffer:

DATA#ALPHAONE;BETATWO.CHARLIETHREE! 

Format: The buffer always starts with "DATA #" as an alphabetic header. After that, it will have one or more text fields separated by a semicolon, period, or exclamation point.

My Regex pattern (in C #) so far:

 string singleFieldPattern = "(?'Field'.*?)(?'Separator'[;.!])"; string fullBufferPattern = "(?'Header'DATA#)(" + singleFieldPattern + ")+"; 

The problem occurs when I try to flush data that matches:

 Regex response = new Regex(fullBufferPattern); string example = "DATA#ALPHAONE;BETATWO.CHARLIETHREE!"; Debug.WriteLine("RegEx Matches?: {0}", response.IsMatch(example)); foreach (Match m in response.Matches(example)) { foreach(string s in new string[]{"Header", "Field", "Separator"}) { Debug.WriteLine("{0} : {1}", s, m.Groups[s]); } } 

The only way out:

 RegEx Matches?: True Header : DATA# Field : CHARLIETHREE Separator : ! 

I selected the following output:

 RegEx Matches?: True Header : DATA# Field : ALPHAONE Separator : ; Field : BETATWO Separator : . Field : CHARLIETHREE Separator : ! 

My expression did not receive the earlier ALPHAONE and BETATWO (and their delimiters ; and . ), As I expected. He only captured the last field ( CHARLIETHREE ).

How can I get all parts that match singleFieldPattern ?


I have simplified the data format above for questions, but since some people want real data, here is much closer to the actual data:

(Note: the values ​​in [] are single bytes that are not printable, and spaces for clarity.)

Example:

 [SYN] % SYSNAMScanner[ACK]; BAUDRATE57600[ACK]; CTRLMODEXON[ACK]; 

Transfer:
System Name (SYSNAM) - "Scanner"
Baud rate is 57,600 people Flow Control - XON

+4
source share
4 answers

I am trying to do this in VB (because this is what I have open), but consider repeating Capture for a group:

  For Each m As Capture In response.Match(example).Groups("Field").Captures Debug.WriteLine(m.Value) Next 

gives me

 ALPHAONE BETATWO CHARLIETHREE 
+1
source

If you do not mind LINQ, you can do this:

 string data = "DATA#ALPHAONE;BETATWO.CHARLIETHREE!"; var fullBufferPattern = @"(?<header>DATA#)(?<fields>.+)[;.!]"; var fieldPattern = @"(?<field>[^;.!]+)[;.!]?"; var fields = Regex.Matches(data, fullBufferPattern) .OfType<Match>() .SelectMany( m => Regex.Matches(m.Groups["fields"].Value, fieldPattern) .OfType<Match>()) .Select(m => m.Groups["field"].Value).ToArray(); 

The fields variable will have:

 ALPHAONE BETATWO CHARLIETHREE 

Edit: To reproduce Debug output, use:

 string data = "DATA#ALPHAONE;BETATWO.CHARLIETHREE!"; var fullBufferPattern = @"(?<header>DATA#)(?<fields>([^;.!]+[;.!])+)"; var fieldPattern = @"(?<field>[^;.!]+)(?<separator>[;.!])"; var groups = Regex.Matches(data, fullBufferPattern) .OfType<Match>() .Select( m => new { Header = m.Groups["header"], Fields = Regex.Matches(m.Groups["fields"].Value, fieldPattern) .OfType<Match>() .Select(f => new { Field = f.Groups["field"], Separator = f.Groups["separator"] }) }); foreach (var element in groups) { Debug.WriteLine("Header : {0}", element.Header); foreach (var field in element.Fields) { Debug.WriteLine("Field : {0}", field.Field); Debug.WriteLine("Separator : {0}", field.Separator); } } 

Output:

 Header : DATA# Field : ALPHAONE Separator : ; Field : BETATWO Separator : . Field : CHARLIETHREE Separator : ! 
+3
source

This LINQ bit will bind fields and delimiters to your regular expression:

 var ms = response.Matches(example); foreach (Match m in ms) { string header = m.Groups["Header"].Value; Debug.WriteLine("Header : " + header); var pairs = m.Groups["Field"].Captures.Cast<Capture>().Zip( m.Groups["Separator"].Captures.Cast<Capture>(), (f, s) => new { Field = f.Value, Separator = s.Value }); foreach (var pair in pairs) { Debug.WriteLine(pair.ToString()); } } 

It is output:

 Header : DATA# { Field = ALPHAONE, Separator = ; } { Field = BETATWO, Separator = . } { Field = CHARLIETHREE, Separator = ! } 
+3
source

so that you can get all the values ​​that follow the header pattern or with one field pattern?

 "(?'Header'^DATA#)|(?'Field'.*?)(?'Separator'[;.!])" 

which should do well, not sure if you have what you parse, though.

+1
source

Source: https://habr.com/ru/post/1491155/


All Articles