πŸ˜ƒ (and other Unicode characters) in identifiers invalid by g ++

I 😞 to find that I cannot use πŸ˜ƒ as a valid identifier with g ++ 4.7, even with the -fextended-identifiers option -fextended-identifiers :

 int main(int argc, const char* argv[]) { const char* πŸ˜ƒ = "I'm very happy"; return 0; } 

main.cpp: 3: 3: error: roaming '\ 360 in the program
main.cpp: 3: 3: error: wandering '\ 237 in the program
main.cpp: 3: 3: error: deviation '\ 230 in the program
main.cpp: 3: 3: error: wandering '\ 203 in the program

After some googling, I found that UTF-8 characters are not yet supported in identifiers , but a universal symbol-name should work. Therefore, I convert my source to:

 int main(int argc, const char* argv[]) { const char* \U0001F603 = "I'm very happy"; return 0; } 

main.cpp: 3: 15: error: universal character \ U0001F603 is not valid in identifier

Thus, it is obvious that πŸ˜ƒ is not a valid identifier character. However, the standard specifically allows the use of characters from the range 10000-1FFFD in Appendix E.1 and does not prohibit it as the starting character in E.2. My next effort was to see if any other Unicode characters were working, but none of them tried. Even irrelevant PILE OF POO (πŸ’©) .

So, for the sake of meaningful and descriptive variable names, what gives? Does -fextended-identifiers , how does it advertise or not? Is this only supported in the latest build? And what kind of support do other compilers have?

+55
c ++ gcc c ++ 11 unicode g ++
Oct 02
source share
3 answers

Starting with version 4.8, gcc does not support non-BMP characters used as identifiers . This seems like an unnecessary limitation. In addition, gcc only supports the very limited character set described in ucnid.tab , based on C 99 and C ++ 98 (it does not upgrade to C11 and C ++ 11 yet, it seems).

As described in the manual, -fextended-identifiers are experimental , so they are more likely to not work as expected.




Edit:

GCC supports the C11 character set, starting with 4.9.0 ( more precisely, svn r204886 ). Therefore, the OP second piece of code using \U0001F603 works. I still can’t get real code using πŸ˜ƒ to work even with -finput-charset=UTF-8 from GCC 8.2 to https://gcc.godbolt.org (you may want to follow this error message provided by @ DanielWolf ).

Meanwhile, both parts of the code work on clang 3.3 without any options other than -std=C++11 .

+19
Oct 02
source share

This is a known bug in GCC: Error 67224 - UTF-8 support for identifier names in GCC .

Edit: Good news: the problem has been fixed ! The upcoming GCC 10 will support UTF-8 characters in identifiers.

+7
Feb 10 '17 at 11:47
source share

However, the standard specifically allows the use of characters from the range 10000-1FFFD in Appendix E.1 and does not prohibit it as the starting character in E.2.

It should be borne in mind that just because the C ++ standard allows (or disallows) certain functions does not necessarily mean that your compiler supports (or does not) support this function.

+5
Oct 02
source share



All Articles