I am looking for various (better) ways to analyze structured text data in PHP and get this data in a PHP object graph. I have seen many different parsers in PHP for various text file formats, but pretty much all of them seem to be some kind of fragile chain of regular expressions. There must be a better way!
In this particular case, I am looking to analyze MT940 files (bank account transactions). But I ran into the same problem with other file formats. Invariably, I get a large chain of regular expressions, which becomes difficult to maintain, especially when you need to support various formats. MT940 also has this problem. MT940 is not a strictly defined format, and almost all banks use a slightly different dialect.
So, how do you develop parsers that are more reliable and extensible to work with different dialects?
Here is an example MT940 statement taken from this question :
{1:F01AHHBCH110XXX0000000000}{2:I940X N2}{3:{108:XBS/091502}}{4: :20:XBS/091202/0001 :25:5887/507004-50 :28C:140/1 :60F:C0914CHF7789, :61:0912021202D36,80NTRFNONREF//0887-1202-29-941 04392579-0 LUTHY + xxx, ZUR :86:6034?60LUTHY + xxxx, ZUR vom 01.12.09 um 16:28 Karten-Nr. 2232 2579-0 :62F:C091202CHF52,2 :64:C091302CHF52,2 -}
source share