Why does lisp use gensym and no other languages?

Correct me if I am wrong, but there is nothing like gensym in Java, C, C ++, Python, Javascript or any other languages ​​that I used, and I never thought it was necessary. Why is this necessary in Lisp and not in other langauges? For clarification, I study Common Lisp.

+6
source share
4 answers

Common Lisp has a powerful macro system. You can create a new syntax that will behave the way you want them to behave. He even expressed his own language in it, creating everything in a language accessible for converting code from what you want to write to what CL really understands. All languages ​​with powerful macro systems provide gensym or do it implicitly in their macro implementation.

In Common Lisp, you use gensym when you want to create code in which a character does not have to match elements that were used elsewhere. Without this, there is no guarantee that the user will use the symbol used by the macro developer, and they will begin to interfere, and the result is something other than the intended behavior. This ensures that a nested extension of the same macro does not interfere with previous extensions. Using Common Lisp macrosystems, more restrictive macrosystems like the syntax-rules and syntax-case .

There are several macro systems in the circuit. One with a pattern where newly entered characters act automatically as if they were made using gensym . syntax-case also, by default, create new characters, as if they were made using gensym , as well as a way to reduce hygiene. You can make CL defmacro with a syntax-case , but since Scheme does not have gensym , you cannot make hygienic macros with it.

Java, C, C ++, Python, Javascript are Algol dialects , and none of them have simple template-based macros. Thus, they do not have gensym , because they do not need it. Since the only way to introduce new syntax in these languages ​​is to wish the next version to provide it.

There are two dialects of Algol with powerful macros that come to mind. Nemerle and Perl6 . Both of them have a hygienic approach, meaning the entered variables behave as if they were made using gensym .

In CL, Scheme, Nemerle, Perl6 you do not need to wait for language functions. You can make them yourself! News in Java and PHP is easily implemented using macros in any of them, if it is not already available.

+6
source

This is not to say which languages ​​have the equivalent of GENSYM . Many languages ​​do not have a first-class character data type (with interned and non-reinforced characters), and many of them do not provide code generation objects (macros, ...).

The intern symbol is registered in the package. Uninteresting - no. If the reader (the reader is a Lisp subsystem that accepts text s-expressions as input and return data), sees two interned characters in the same package and with the same name, it assumes that this is the same character:

 CL-USER 35 > (eq 'cl:list 'cl:list) T 

If the reader sees an uninterrupted character, he creates a new one:

 CL-USER 36 > (eq '#:list '#:list) NIL 

Non-interminable characters are written with #: in front of the name.

GENSYM used in Lisp to create numbered uninterrupted characters, because it is sometimes useful for generating code and then debugging that code. Note that the characters are always new, not eq for anything else. But the name of the symbol may be the same as the name of another symbol. The number gives the reader a key about personality.

Example using MAKE-SYMBOL

MAKE-SYMBOL creates a new uninterrupted character using a string argument as its name.

Let's see how this function generates some code:

 CL-USER 31 > (defun make-tagbody (exp test) (let ((start-symbol (make-symbol "start")) (exit-symbol (make-symbol "exit"))) `(tagbody ,start-symbol ,exp (if ,test (go ,start-symbol) (go ,exit-symbol)) ,exit-symbol))) MAKE-TAGBODY CL-USER 32 > (pprint (make-tagbody '(incf i) '(< i 10))) (TAGBODY #:|start| (INCF I) (IF (< I 10) (GO #:|start|) (GO #:|exit|)) #:|exit|) 

The above generated code uses uninterrupted characters. Both #:|start| are actually the same symbol. We will see this if we had *print-circle* up to T , since then the printer would clearly identify identical objects. But here we do not receive this additional information. Now, if you embed this code, you will see more than one start and one exit character, each of which was used in two places.

Example using GENSYM

Now use GENSYM . Gensim also creates a symbol without a symbol. This character is optionally called a string. A number is added (see CL:*GENSYM-COUNTER* ).

 CL-USER 33 > (defun make-tagbody (exp test) (let ((start-symbol (gensym "start")) (exit-symbol (gensym "exit"))) `(tagbody ,start-symbol ,exp (if ,test (go ,start-symbol) (go ,exit-symbol)) ,exit-symbol))) MAKE-TAGBODY CL-USER 34 > (pprint (make-tagbody '(incf i) '(< i 10))) (TAGBODY #:|start213051| (INCF I) (IF (< I 10) (GO #:|start213051|) (GO #:|exit213052|)) #:|exit213052|) 

Now the number is an indicator that two uninterrupted characters #:|start213051| actually the same. When the code is nested, the new version of the start character will have a different number:

 CL-USER 7 > (pprint (make-tagbody `(progn (incf i) (setf j 0) ,(make-tagbody '(incf ij) '(< j 10))) '(< i 10))) (TAGBODY #:|start2756| (PROGN (INCF I) (SETF J 0) (TAGBODY #:|start2754| (INCF IJ) (IF (< J 10) (GO #:|start2754|) (GO #:|exit2755|)) #:|exit2755|)) (IF (< I 10) (GO #:|start2756|) (GO #:|exit2757|)) #:|exit2757|) 

Thus, it helps to understand the generated code, without the need to include *print-circle* on, which will denote the same objects:

 CL-USER 8 > (let ((*print-circle* t)) (pprint (make-tagbody `(progn (incf i) (setf j 0) ,(make-tagbody '(incf ij) '(< j 10))) '(< i 10)))) (TAGBODY #3=#:|start1303| (PROGN (INCF I) (SETF J 0) (TAGBODY #1=#:|start1301| (INCF IJ) (IF (< J 10) (GO #1#) (GO #2=#:|exit1302|)) #2#)) (IF (< I 10) (GO #3#) (GO #4=#:|exit1304|)) #4#) 

The Lisp is read above for the reader (a subsystem that reads s-expressions for textual representations), but a little less for the reader.

+8
source

I believe that characters (in the sense of Lisp) are mostly useful in homoiconic (those in which the syntax of a language is represented as data from that language).

Java, C, C ++, Python, Javascript are not homoiconic.

Once you have the characters, you want to somehow create them dynamically. gensym is an opportunity, but you can also intern them.

BTW, MELT is a dialect of lisp, it does not create characters with gensym or by interning strings, but with clone_symbol . (in fact, the MELT characters are instances of the predefined CLASS_SYMBOL , ...).

+1
source

gensym is available as a predicate in most Prolog interpreters. You can find it in the eponym library.

0
source

Source: https://habr.com/ru/post/986770/


All Articles