EDIFACT parsing is actually not that complicated. Just divide by sytax characters: first in '
to get segments, than in +
to get data elements of these segments and :
to get individual components. Of course, you need to take care of the evacuated separator characters. The characters used here are only by default, they can be changed at the beginning of the message by an additional UNA segment. In fact, the Wikipedia article on EDIFACT provides a pretty good (but brief) introduction to this. And the format is documented with detailed information on the UN UNECE website (yes, it's a lot and hard to read).
The hard part is to get the information from this and into your application (and check its accuracy, leave it alone, creating good error messages). If you really plan to write a parser compiler from nothing for everything in any language, then: No, there is no easy way to do this. There are also no other flexible data representations. This is a difficult task and will always be.
But here is the idea: if you have so much in XML (or any other "modern technology" as you call it ...). It would be a relatively simple task to write some program that converts EDIFACT messages into some single XML-EDIFACT-Format (which is pretty awful and will most likely go crazy). You can convert each EDIFACT segment into a single XML tag, possibly like this:
ERC+A7V:1:AMD' IFT+3+NO MORE FLIGHTS'
In XML:
<segment qualifier="ERC"> <element> <component>A7V</component> <component>1</component> <component>AMD<component> </element> </segment> <segment qualifier="IFT"> <element> <component>3</component> </element> <element> <component>NO MORE FLIGHTS</component> </element> </segment>
You can then deploy the power of your XML tools and libraries for validation / evaluation.
You can also do this more specifically, for example:
<segment_ERC> <element> <component>A7V</component> <component>1</component> <component>AMD<component> </element> </segment_ERC> <segment_IFT> <element> <component>3</component> </element> <element> <component>NO MORE FLIGHTS</component> </element> </segment_IFT>
This will simplify verification via XSD. In this conversation you can get, of course, what you need, but sooner or later you will get to the point where you will need to put information about the structure of your currently parsed message in the converter (since you donβt know which segments are nested to other segments grouping them, not just UNG
, UNH
and such, but also some segment groups that you donβt see directly).
However, you will need to create special evaluation programs / schemes / whatevers for the messages you receive in accordance with the EDIFACT directories, which you should receive as documentation.