Gettext character encoding

I have the following gettext .po file that has been translated from a .pot file. I am working on a Linux system ( openSUSE , if that matters), running gettext 0.17.

# # < translate@transme.de >, 2011 # transer < translate@transme.de >, 2011 msgid "" msgstr "" "Project-Id-Version: transtest\n" "Report-Msgid-Bugs-To: \n" "POT-Creation-Date: 2011-05-24 22:47+0100\n" "PO-Revision-Date: 2011-05-30 23:03+0100\n" "Last-Translator: \n" "Language-Team: German (Germany)\n" "MIME-Version: 1.0\n" "Content-Type: text/plain; charset=UTF-8\n" "Content-Transfer-Encoding: 8bit\n" "Language: de_DE\n" "Plural-Forms: nplurals=2; plural=(n != 1)\n" #: transtest.cpp:12 msgid "Min Size" msgstr "Min Grâße" 

Now when I create the .mo file through

 msgfmt -c transtest_de_DE.po -o transtest.mo 

Then I check the encoding with the file command,

 file --mime transtest_de_DE.po transtest_de_DE.po: text/x-po; charset=utf-8 

and then install it in my locale folder and run the program after exporting LANG and LC_CTYPE , I get garbage in which there are two characters other than ASCII.

If I set my terminal encoding to ISO-8859-2 and not UTF-8 , then I see two characters correctly.

Inside the generated .mo file with a text editor, the file is also in UTF-8 (I can see the characters if I set the encoding of the UTF-8 editor).

The program is very simple, and it looks like this:

 #include <iostream> #include <locale> const char *PROGRAM_NAME="transtest"; using namespace std; int main() { setlocale (LC_ALL, ""); bindtextdomain( PROGRAM_NAME, "/usr/share/locale" ); textdomain( PROGRAM_NAME ); cerr << gettext("Min Size") << endl; } 

I set the .mo file to /usr/share/locale/de_DE/LC_MESSAGES/transstest.mo and I exported LC_CTYPE and LANG as "de_DE".

 $ echo $LC_CTYPE; echo $LANG de_DE de_DE 

Where am I mistaken? Why does gettext give me the wrong encoding (ISO-8859-2) for my lines and not the requested (in the .po file) UTF-8?

Edit:

The solution was in the Stack issue with overflow. You cannot force (UTF-8) the traditional Chinese character to work in PHP gettext extension (.po and .mo files created in poEdit) , and it seems to me that I need to explicitly call

 bind_textdomain_codeset(PROGRAM_NAME, "utf-8"); 

The last program looks like this:

 #include <iostream> #include <locale> const char *PROGRAM_NAME="transtest"; using namespace std; int main() { setlocale (LC_ALL, ""); bindtextdomain( PROGRAM_NAME, "/usr/share/locale" ); bind_textdomain_codeset(PROGRAM_NAME, "utf-8"); textdomain( PROGRAM_NAME ); cerr << gettext("Min Size") << endl; } 

No changes to any of my gettext files are required.

+6
source share
1 answer

If you have LC_CTYPE=de_DE (or LANG ), it is assumed that programs should output ISO-8859-1 (note, 1, not 2), so if you have this and your terminal is set to utf-8, this just wrong. The correct language for utf-8 is de_DE.utf-8 .

Using bind_textdomain_codeset is wrong in your case. bind_textdomain_codeset used if you want to work in a fixed encoding inside, for example, for example. GNOME does, but the conclusion should always be what the locale indicates (obtained by calling nl_langinfo(CODESET) , which is also what gettext does by default).

+4
source

Source: https://habr.com/ru/post/889407/


All Articles