Analysis: library functions, FSM, explode () or lex / yacc?

When I have to parse text (for example, configuration files or other fairly simple / descriptive languages), I come up with several solutions:

  • using library functions, for example. strtok() , sscanf()
  • end state machine that processes one char at a time, tokenization and parsing
  • using the explode() function, which I once wrote out of sheer boredom
  • using lex / yacc (read: flex / bison ) to create the appropriate parser

I don't like the "library function" approach. He feels awkward and awkward. explode() , although it does not require a lot of new code, it feels even more exploded. And flex / bison often seems like an overkill.

Usually I use FSM, but at the same time I feel sorry for the poor guy who may have to support my code at a later point.

Hence my question:

What is the best way to parse relatively simple text files?
Does it matter at all?
Is there a generally accepted approach?

+6
source share
3 answers

I’ll break the rules a bit and answer your questions out of order.

  • Is there a generally accepted approach?

Absolutely not. IMHO the decision you choose should depend on (your name) your text, your timeframe, your experience, even your personality. If the text is simple enough to make flex and bison redundant, perhaps C itself is redundant. Is it important to be fast or reliable? Does it need to be supported, or can it start fast and dirty? Are you a passionate C user, or can you be passionate about the necessary language features? & c., & c.

  • Does it matter at all?

Again, this is something you can answer. If you work closely with a group of people with certain skills and abilities, and the parser is important and needs to be supported, it definitely matters! If you write something "out of sheer boredom", I would suggest that it does not matter at all, no .:-)

  • What is the best way to parse relatively simple text files?

Well, I don’t know that you will like my answer. Perhaps read some other great answers first.

No, really, forward. I'll wait.

Ah, you came back and relaxed. Let it make things easier, right?

Never write it to 'C' if you can do it to 'awk'; Never do this in awk if sed can handle it,
Never use sed when tr can do the job;
Never call tr ​​when cat is enough.
Avoid using "cats" whenever possible.
- The laws of Taylor programming

If you write it in C, but C seems to be the wrong tool ... it really could be the wrong tool. awk or perl will most likely do what you are trying to do without aggravation. You can even do this with cut or something similar.

On the other hand, if you write it in C, you probably have a good reason to write it in C. Perhaps your parser is a tiny part of a much larger system, which, for the sake of argument, is built into the refrigerator to the moon. Or maybe you're loooove C. You can even hate awk and perl , forfend heaven.

If you don't hate awk and perl , you can include them in your C program. This is doable, in principle - I never did it myself. For awk try libmawk . There are several ways for perl (TMTOWTDI). You can run perl separately with popen to run it, or you can actually embed the Perl interpreter in your C program - see man perlembed .

In any case, as I said, the "best way to make out" is entirely up to you and your team, the problem space and your approach to the problem. What I can offer is my opinion.

I'm going to assume that in your C-only solutions (library functions and FSM (given that your explode is essentially a library function)) you have already done everything possible to isolate the corresponding code, code and files, etc.

However, I recommend lex and yacc .

Library functions feel "awkward and awkward." State machine seems unbearable. But you say that lex and yacc feel redundant.

I think you should treat your complaints differently. What you really do is give FSM guidance. However, you also hire someone to write and support for you, thereby solving most of the maintainability problem. Overkill? Did I mention that they will work for free?

I suspect, but I don’t know, that the reason lex and yacc initially felt unnecessary, was that your configuration / simple files just felt too good. If I'm right (big if), you can do most of your work in a lexer. (You could even assume that you can do all of your work in a lexer, but I don’t know anything about your input.) If your input is not only simple, but also widespread, you can easily find the lexer / parser combination available for which you necessary.

In short: if you can do it not in C, try something else. If you want C, use lex and yacc - they have a bit of overhead, but they are a very good solution.

+6
source

If you can make it work, I would go with FSM, but with a lot of help from Perl-compatible regular expressions . This library is easy to understand, and you should be able to compensate for extraneous extraneous spaghetti to give your monster the aerodynamic flair that all flying monsters strive for. This and a lot of comments in well-structured spaghetti should make your supporting code a successor to the comfort. (And, as I am sure, you know, this successor supporting the code is after six months when you moved on to something else, and the details of this code slipped through.)

0
source

My short answer is to properly use the problem. If you have configuration files, use existing standards and formats, for example. ini Files and analyze them using the Boost program_options .

If you are entering the world of “native” languages, use lex/yacc , as they provide you with the necessary functions, but you must consider the cost of supporting the grammar and language.

As a result, I would recommend narrowing the scope of the problem even further to find the right tool.

0
source

Source: https://habr.com/ru/post/886105/


All Articles