Parsing a FIX Regular Expression Message

I found the second answer of the Parsing FIX protocol in regex? to be very enjoyable, so I tried it.

Here is my code.

new_order_finder1 = re.compile("(?:^|\x01)(11|15|55)=(.*?)\x01") new_order_finder2 = re.compile("(?:^|\x01)(15|55)=(.*?)\x01") new_order_finder3 = re.compile("(?:^|\x01)(11|15|35|38|54|55)=(.*?)\x01") if __name__ == "__main__": line = "20150702-05:36:08.687 : 8=FIX.4.2\x019=209\x0135=D\x0134=739\x0149=PINE\x0152=20150702-05:36:08.687\x0156=CSUS\x011=KI\x0111=N09080243\x0115=USD\x0121=2\x0122=5\x0138=2100\x0140=2\x0144=126\x0148=AAPL.O\x0154=1\x0155=AAPL.O\x0157=DMA\x0158=TEXT\x0160=20150702-05:36:08.687\x01115=Tester\x016061=9\x0110=087\x01" fields = dict(re.findall(new_order_finder1, line)) print(fields) fields2 = dict(re.findall(new_order_finder2, line)) print(fields2) fields3 = dict(re.findall(new_order_finder3, line)) print(fields3) 

Here is the conclusion

 {'11': 'N09080243', '55': 'AAPL.O'} {'55': 'AAPL.O', '15': 'USD'} {'35': 'D', '38': '2100', '11': 'N09080243', '54': '1'} 

It looks like some of the fields are incorrectly connected by a regular expression.

What is the problem?

+4
source share
2 answers

The final \x01 consumes the material you want to combine. The regular expression sequence will continue with the next match after the previous thing that matched.

With understanding, fixing is easy. Just replace the final \x01 with (?=\x01) .

 import re new_order_finder3 = re.compile("(?:^|\x01)(11|15|35|38|54|55)=(.*?)(?=\x01)") if __name__ == "__main__": line = "20150702-05:36:08.687 : 8=FIX.4.2\x019=209\x0135=D\x0134=739\x01"\ "49=PINE\x0152=20150702-05:36:08.687\x0156=CSUS\x011=KI\x01" \ "11=N09080243\x0115=USD\x0121=2\x0122=5\x0138=2100\x0140=2\x01" \ "44=126\x0148=AAPL.O\x0154=1\x0155=AAPL.O\x0157=DMA\x0158=TEXT\x01" \ "60=20150702-05:36:08.687\x01115=Tester\x016061=9\x0110=087\x01" fields3 = dict(re.findall(new_order_finder3, line)) print(fields3) 
+5
source

The problem is with \x01 at the end, consuming the \x01 delimiter, which leads to the fact that the pattern always fails in a key-value pair adjacent to one just matched, since none of (?:^|\x01) coincide.

Using this substring of your input as an example, the mapping to new_order_finder3 :

 \x0154=1\x0155=AAPL.O\x01 ------------ X 

As you can see, after he managed to match the key-value pair 54=1 , he also consumes \x01 , and the adjacent key-value pair can never be matched.

There are several ways to solve this problem. One solution is to put \x01 at the end of the pending statement so that we can make sure that \x01 completes the key-value pair without consuming it:

 new_order_finder3 = re.compile("(?:^|\x01)(11|15|35|38|54|55)=(.*?)(?=\x01)") 

Now the output contains all the expected fields:

 {'11': 'N09080243', '38': '2100', '15': 'USD', '55': 'AAPL.O', '54': '1', '35': 'D'} 
+4
source

Source: https://habr.com/ru/post/901973/


All Articles