How should keywords be analyzed when writing a C compiler?

I am currently writing a C compiler in an assembly, it is not intended for practical use, but I would like to do this for educational value. I was wondering when I test the keywords, is there a more efficient way, and not just read the next word in the file and then run it through a bunch of nested if statements that check the keywords. Is there a better way?

+5
source share
4 answers

Your question is actually quite specific. You are asking how to create a lexical analyzer, also known as a scanner, and how to efficiently and conveniently recognize keywords. The scanner is the first phase of a typical compiler and converts the source code, which is a sequence of characters, into a sequence of tokens, where the token is a unit such as a number, operator, or keyword.

Since the keywords follow the pattern for common identifiers, the general trick is to put all the keywords in a symbol table along with information that it is a keyword. Then, when the scanner finds the identifier, it, as usual, looks for the symbol table to see if this identifier was previously detected. If this identifier was kewyord, it will be found along with information about which keyword it is.

+8
source

Are you doing this for part of the class? If so, there should be recommendations for parsing and lexing. If not, you have a lot of work!

Writing the actual compiler is much more complicated than just looking at a lot of if statements because you need to keep track of the environment. You need to think about how you resolve classes, functions, function calls, class instances, recursive functions ... the list goes on.

Take a look at UC Berkeley's lecture course on this subject, such as parsing, lexing, code generation, and the tools you need:

http://www-inst.eecs.berkeley.edu/~cs164/fa13/

Note that this course, in particular, used C ++ to write the Python2.5 compiler for assembly, but the concepts in lectures and readings and some tools are not limited to languages.

+4
source

Keywords (instead of tokens in general) are a closed set, for which it is practical to create a messy hash function. Since the collection is small, it does not even need to have a minimal hash function.

+3
source

You can do this with the if-else if and strcmp () commands. However, writing statements for all keywords is very annoying. You’d better use a hash table — at the start of compilation, you put all the keywords in the table, and then search as needed. The disadvantage of this is that if you need to use C, you will also have to write your own hash table (or use it from the library). However, if you can use C ++, you can use a map or unordered_map from STL. In any case, if you are worried about performance, as someone else said, it will not be a bottle neck.

0
source

Source: https://habr.com/ru/post/1236290/


All Articles