Parsing a FIX protocol in regex?

Question

Parsing a FIX protocol in regex?

I need to parse log files containing FIX protocol messages.

Each row contains header information (timestamp, logging level, endpoint), and then the FIX payload.

I used regex to parse header information in named groups. For instance:.

<?P<datetime>\d{2}/\d{2}/\d{2} \d{2}:\d{2}:\d{2}.\d{6}) (?<process_id>\d{4}/\d{1,2})\s*(?P<logging_level>\w*)\s*(?P<endpoint>\w*)\s*

Then I come to the FIX payload (^ A is the separator between each tag), for example:

 8=FIX.4.2^A9=61^A35=A...^A11=blahblah...

I need to extract certain tags from it (for example, “A” from 35 = or “blah-blah” from 11 =) and ignore all other things - basically I need to ignore anything until “35 = A” and anything after "11 = blah blah", then ignore anything after that, etc.

I know there are libraries that could parse each tag (http://source.kentyde.com/fixlib/overview), however I was hoping for a simple approach using regex here if possible, since I really only need a few tags .

Is there a good way in regex to retrieve the tags I need?

Cheers, Victor

+6

python regex fix

victorhooi Nov 21 '11 at 5:35

source share

3 answers

No need to split by "\ x01", then select regex again. If you need only tags 34.49 and 56 (MsgSeqNum, SenderCompId and TargetCompId), you can specify a regular expression:

 dict(re.findall("(?:^|\x01)(34|49|56)=(.*?)\x01", raw_msg))

Simple regular expressions like this will work if you know that your sender has no embedded data that can cause an error in any simple regular expression. In particular:

There are no Raw Data fields (in fact, a combination of len data and raw data, such as RawDataLength, RawData (95/96) or XmlDataLen, XmlData (212,213)
No coded fields for Unicode strings like EncodedTextLen, EncodedText (354/355)

Handling these cases requires a lot of additional parsing. I use my own python parser, but even the fixlib code you referenced above mistakenly accepts these cases. But if your data does not match these exceptions, the regex above should return a good dict of your desired fields.

Edit: I left the above regular expression as it is, but it should be revised so that the final matching element is (?=\x01) . An explanation can be found in @tropleee here. .

+9

Phil cooper Jan 17 '12 at 21:34

source share

^ A actually \ x {01}, this is how it appears in vim. In perl, I did this through split on hex 1, and then on split = =, in the second split, the value [0] of the array is Tag and the value [1] is the value.

+1

Ka Dec 29 '11 at 19:05

source share

dlamblin · Accepted Answer · 2011-11-21T05:37:45+0000

Use a regular expression tool like expresso or regexbuddy.
Why don't you split by ^A and then match ([^=])+=(.*) For each one that hashes them? You can also filter using the switch, which by default will not add tags that you are not interested in, and which fail for all the tags you are interested in.

Parsing a FIX protocol in regex?

More articles: