SVG parsing and data type

Question

SVG parsing and data type

I am writing an SVG parser, mainly as an exercise to learn how to use Parsec. I am currently using the following data type to represent my SVG file:

data SVG = Element String [Attribute] [SVG] | SelfClosingTag [Attribute] | Body String | Comment String | XMLDecl String

This works pretty well, however I'm not sure about the Element String [Attribute] [SVG] my data type. Since there is only a limited number of tags for SVG, I was thinking of using a type to represent an SVG element instead of using String. Something like that:

 data SVG = Element TagName [Attribute] [SVG] | ... data TagName = A | AltGlyph | AltGlyphDef ... | View | Vkern

Is that a good idea? What are the benefits of this, if any? Is there a more elegant solution?

+5

types parsing haskell

Elie gnrd Jan 18 '16 at 17:37

source share

2 answers

To answer your question:

You can do this anyway depending on what you are going to do with the parse tree after you create it.

If all you need to do with you, the SVG parser describes the SGV data form, you just float with the string.

On the other hand, if you want to somehow convert the SVG data to something like graphic (that is, you expect your AST to be evaluated), you will find that it is best to represent all the semantic information in a type system. This will simplify the following steps.

The question, in my opinion, is whether the syntax pass is exactly the place where this happens. (Full disclosure, I am only familiar with SVG.) I suspect that instead of just a flat list of tags, you would be better off with Element each with its own set of necessary and optional attributes. if this conversion "happens later in the program", there is no need to create a TagName data TagName . You can catch all type errors at the same time as you combine attributes in Element s.

On the other hand, a good argument could be analyzed directly into the full tree of Elements, in which case I would drop the general [Attribute] and [SVG] fields of the Element constructor and instead make the corresponding fields in the TagName constructor.

Another answer to a question that you did not ask:

Put the source code in your parse tree earlier. From personal experience, I can say that it becomes more difficult for you, the larger your program.

+4

John F. Miller Jan 19 '16 at 0:25

source share

Matt · Accepted Answer · 2016-01-18T19:57:19+0000

I personally prefer the enumeration approach of all possible TagName s. Thus, the compiler can give you errors and warnings if you make any careless errors. For example, if I want to write a function that encompasses all possible Element types, then if each type is enumerated in ADT, the compiler may give you not exhaustive matching warnings. If you represent it as a string, this is not possible. Also, if I want to match an Element specific type, and I accidentally missed TagName , the compiler will catch it. The third reason, which is probably not really applicable here, but in general it is worth noting that if later I decide to add or remove the TagName option, then the compiler will tell me all the places that need to be changed. I doubt this will happen for SVG tag names, but overall this should be kept in mind.

SVG parsing and data type

More articles: