Design Considerations for Internationalization

I read Joel's article on Unicode , and I feel like I have at least a basic understanding of perspective character set internationalization. In addition to reading this question , I also did some of my own research on internationalization regarding design considerations, but I cannot help but suspect that there is much more there that I simply don’t know or don’t know to ask.

Some of the things I learned:

  • Some languages ​​are read from right to left instead of left to right.
  • Calendar, dates, times, currencies and numbers are displayed differently from language to language.
  • The design must be flexible enough to accommodate much more text, because some languages ​​are much more detailed than others.
  • Do not use icons or colors for when it comes to their semantic meaning, as this can vary from culture to culture.
  • Geographic nomenclature varies from language to language.

Where I am:

  • My design is flexible enough to accommodate much more text.
  • I automatically translate every line, including error messages and help dialogs.
  • I still did not understand where I needed to display units of time, currency or numbers, but I will be there soon and I will need to develop a solution.
  • I use the UTF-8 character set in all directions.
  • My menus and various lists in the app are sorted alphabetically for each language for easier reading.
  • I have a parser tag that retrieves tags by filtering out stop words. the list of stopped words is a language and can be replaced.

What I would like to know more about:

  • I am developing a downloadable PHP web application, so any specific recommendations regarding PHP would be greatly appreciated. I have developed my own structure and I am not interested in using others at present.
  • I know very little about non-Western languages. Are there specific considerations that need to be accepted that I did not mention above? Also, how does a PHP array sort function handle non-western characters?
  • Are there any specific errors that you experienced in practice? I look with both a graphical interface and application code.
  • Any specific advice for working with date and time display? Is there a breakdown by region or language?
  • I have seen many projects and sites let their communities provide translation for their applications and content. Do you recommend this and what are some good strategies for ensuring that you have a good translation?
  • This question is basically about what I know about internationalization. What don’t I know, that I don’t know, that I should take a look further?

Change I added a reward because I would like to get more real life examples from experience.

+43
php internationalization locale
Mar 13 '09 at 18:51
source share
11 answers

Our game Gemsweeper has been translated into 8 different languages. Some things that I learned during this process:

  • If the translator is assigned separate sentences for the translation, make sure that he is aware of the context in which each sentence is used. Otherwise, it may provide one possible translation, but not the one you meant. Tools like Babelfish translate out of context, so the result is usually so bad. Just try translating any non-trivial text from English to German and vice versa, and you will understand what I mean.

  • Proposals that need to be translated should not be split into different parts for the same reason. This is because you need to maintain context (see the previous paragraph) and because some languages ​​may have variables at the beginning or end of a sentence. Instead of breaking the sentence, use placeholders. For example, instead of

"This is the step" of our 15-step tutorial "

Write something like:

"This is the% 1 step of our 15-step tutorial."

and programmatically replace the placeholder.

  • Do not expect the translator to be funny or creative. He is usually not motivated enough to do this unless you name specific text fragments and pay him extra. For example, if you have the word jokes in your language assets, tell the translator in a note so that he does not try to translate them, but instead replaces them with a darker sentence. Otherwise, the translator probably translates the word jokes word by word, which usually leads to complete nonsense. In our case, we had one translator and one joke writer for the most important translation (English).

  • Try to find a translator whose first language is the language into which he is going to translate your software, and not vice versa. Otherwise, he is likely to write a text that may be correct, but sounds strange or old-fashioned for native speakers. In addition, he must live in the country you are targeting your translation for. For example, a German translator from Switzerland would not be a good choice for translating into German.

  • If you have any opportunities, ask one of your public beta users who understands the specific transfer, checks the transferred assets and the finished software. We had very good and very bad translations, depending on who provided it. According to some of our users, the Swedish translation was complete gibberish, but it was too late to do something about it.

  • Remember that for each updated version with new features you will have to translate your languages. This can lead to serious overhead.

  • Keep in mind that end users expect technical support to speak their language if your software is translated. Once again, Babelfish will most likely not.

Edit - A few more points

  • Make the transition between localizations as simple as possible. In Gemsweeper, we have a hotkey for switching between different languages. This makes testing easier.

  • If you intend to use exotic fonts, make sure that they include special characters. The fonts we chose for Gemsweeper were good for the English text, but we had to add a lot of characters that exist only in German, French, Portuguese, Swedish, ...

  • Do not code your own localization infrastructure. You are probably much better off working with open source, like Gettext . Gettext supports functions such as variables in sentences or pluralization, and has a solid structure. Localized resources are compiled, so no one can interfere with them. In addition, you can use tools such as Poedit to translate your files / verify another user's translation and ensure that all lines are correctly translated until you change the source code. I tried to work both on my own and using Gettext, and I have to say that Gettext plus PoEdit was much better.

Editing - More Points

  • Understand that different cultures have different styles of numbers and date formats. Numbering schemes are not only different for each culture, but also for each purpose in this culture. In EN-US, you can format the number '-1234'; '-1,234' or (1,234) depending on the purpose of the destination. Understand other cultures do the same.

  • Know where you get globalization information from. . Windows has settings for CurrentCulture, UICulture, and InvariantCulture. Understand what each means and how it interacts with your system (they are not as obvious as you might think).

  • If you intend to do an East Asian translation, do your homework really. East Asian languages ​​have quite a few differences from the languages ​​here. In addition to using multiple alphabets at the same time, they can use different layout systems (top to bottom) or grids. Also, figures in East Asian languages ​​may be different. In en-US, you only change systems for limited conditions (for example, 1 versus 1), there are additional numerical considerations, except for the comma and period.

+55
Mar 13 '09 at 19:19
source share

When we worked on the i18n / l10n Dreamfall and Age of Conan questions, we ran into a few issues that should be kept in mind. We decided some of them, some were solved for us, and some of them we worked. Some of us have never dared ...

  • Make sure that all your tools and all your code support all the encodings that you want to use, and double-check this assumption twice during the project and a couple of times to be sure.

  • Make sure you use a font that supports all the languages ​​that you want to use. Most fonts that claim to be unicode are unicode only in the sense that the characters it has are in the correct code. This does not mean that it has symbols used for all code points.

  • Text packaging is performed not only in space, as some languages ​​do not use space to separate words (Chinese comes to mind). Make sure your word processing routines process text without any spaces.

  • Plural processing is correctly complex in simple cases and damned in hard cases. Make sure that you know enough about the languages ​​you will use in order to be able to write code correctly to handle the multiple question correctly. Keep in mind that English (and other "western" languages ​​are some of the easy ones.

  • Never break sentences and build lines with them so that they correspond to a variable, since a variable could be placed in a different place in a sentence in another language. Use placeholders.

  • Keep in mind that for some languages, a placeholder value can change the way you write a sentence. Grammar is complicated. Make sure you have a plan to deal with it. (In particular, make sure that you have a way to classify the values ​​that you use in placeholders according to gender, time, etc.).

+11
Mar 21 '09 at 19:48
source share
  • My menus and various lists in the application are sorted alphabetically for each language for easier reading.

lists should be sorted, menus should not. keep in mind that this user may want to use your application in several languages, he should still be found everywhere in one place.

same with shortcuts, if you have one: don't translate them.

also remember that internationalization and translation are two different things that they manage separately.

+10
Mar 13 '09 at 19:11
source share

I would like to make the following comments - these are some of the company's guidelines in which Class 1 products are moved to 31 different locales. Following these recommendations, we gave us (our development team, and not the entire company) the highest translation performance.

  • Do not try to reuse error message fragments. For example, don’t think that since you have two errors "You selected the wrong menu item" and "That menu item is not yet available" , you can extract the "menu item" into a separate item and use it in both places. All messages must be self-contained, as their translations may vary depending on the context.

  • Use a professional translator who is knowledgeable about technology. If you get closer to a service like BabelFish, you get everything you deserve. For example, "Microsoft Windows" is "Microsoft Windows" everywhere on the planet; it does not become "Microsoft Fenster" in Germany.

  • Try not to insert variables into your messages (for example, "The %1 has failed" , where %1 changes dynamically), because the positions and, indeed, gender can change: "La table est rubbish" vs. "L'Homme est drunk" or "The red table" vs "La table rouge" . It is better to use a generic noun with added parameters: "The item has failed [%1]" .

  • Translate only those things that the user should see. The log messages in the log file (which will only be used by you) should be in English (or in your own language), and not translated into something like Swahili that you still could not read.

  • Menus should be sorted by functionality, not sort order.

  • Translated units must be stored outside the code and loaded at runtime. This makes translation a problem only for delivering an external file, rather than trying to dig through the changes in the middle of the code. It also makes it easier to add other languages ​​in the future.

Enough. Better to stop before you fall asleep :-)

+8
Mar 20 '09 at 3:57
source share

The thing about numbers: in English, as I understand it, you just use the singular with 1 and the plural with 2 or more. For example: "You have 1 message"; "2 posts"; "3 ... messages." In Russian, these things are becoming more complicated. You use a single value for 1, 21, 31, 41 ... 101, 121 (so for everything ending with 1, except when it ends with 11). Then you use the special genitive for 2, 3, 4; 22, 23, 24; 32, 33, 34 ... 102, 103, 104; 122, 123, 124. And in all other cases, you use the plural genitive.

It is not very difficult to implement. It is difficult, however, to realize something that will know how to deal with any a priori unknown language with all its strangeness :-)

And these are just numbers :-)

+6
Mar 13 '09 at 20:21
source share

I don't have much to add to the great answers so far, but here are a few things to consider and verify.

  • Do not make assumptions. This is a catch rule. It is easy to guess things that belong to a region or language, and it's hard to notice these assumptions.
  • Be very careful with string comparisons. There are several languages, such as Turkish, which have letters that are similar to others visually, but different.
  • Use pseudo-pass as smoke test. . If you are reading translated lines from a resource file, create a pseudo-translated version of the file that you still understand, but that emphasizes the ability and ability of each translated line in the application. For example, align a line like “Cancel” with something like “CancelXXXX!”. so it will be as wide as your guide for translated lines. Then you can check that each row is displayed completely. Extra credit because it also adheres to the most complex symbol that can be displayed to make sure that it is displayed correctly in all places.
  • Do not make assumptions about keyboard layouts. "ASDW" may be a great set of control keys for QWERTY keyboards, but hard coding makes it unfriendly, if not impossible, for people with other keyboard layouts to use.
  • Check the various date settings, then check them again. I saw problems due to something as small as another format for "AM / PM" in the regional settings. Mm / dd / yyyy versus dd / mm / yyyy is also very much, but each setting here can make a difference.
  • Check the various number formats, then check them again. . You do not want, for example, to depend on decimal or thousand separators.
  • Test with and without user login to the server. It may be more for Windows, but it is very easy to get the component on the server configured so that it uses the registered user regional settings when the user is logged in and the default regional parameter when the user is not logged in. This can cause strange, intermittent behavior.
  • Test with various regional and language settings. . As an example, not only Windows has regional and language settings, but IE has its own language. For example, the behavior of an IE client with the first listed may not always match the name en-nz specified by the first, for example.
  • Make sure your translator understands business and languages, and then cross-check someone else. Be very careful when using application-specific terminology. If your program uses certain words to mean something special in the application, make sure that they are broadcast the same in every instance, including in the help text. If you have specific language goals, you can even go so far as to translate such words well in advance and make sure they are poorly translated into the target languages. It’s more like product research, but it can affect what words are used in the interface, and it’s easier if these words are in place from the very beginning. You also want to avoid idioms that can translate poorly.

Well, I had more to say than I thought ...

+6
Mar 26 '09 at 21:15
source share

One thing I learned with difficulty: if you have several files that need to be translated, add an additional tag to the name so that later you can search your entire folder for that tag.

eg. instead of the file name 'sample-database.txt' the name of the English version is' sample-database-loc-en.txt ', the Italian version is' sample-database-loc-it.txt

+5
Mar 13 '09 at 22:27
source share

My first answer is on StackOverflow, so I'm sorry if a little nonsense was said.

From my experience:

  • PHP : gettext was extremely useful;
  • non-Western languages : UTF-8 is everywhere (code, database), and so far everything is fine with us;
  • Are there any specific errors that you experienced in practice? Breaking down long paragraphs for i18n into different sentences can be less costly to translate, if a line is repeated several times on a site, you only need to translate it once. But be careful if you fragment the text too many translators lose context;
  • I have seen many projects and sites allowing their communities to provide translation for their applications and content. Do you recommend this and what are some good strategies to ensure a good translation? If you have a lot of volunteers, go for it, but depending on how much text you have, you may need tons of volunteers. Always make sure that you have someone you trust who is the leader of the language project to be a proofreader to control the accuracy of the translation.
+4
Mar 21 '09 at 20:51
source share
  • Sorting / sorting rules may vary between languages: ä is sorted differently in German than in Swedish. Therefore, sorting should be specific to a particular crop.
  • The upper / lower scale may contain surprises: the German “sharp S” symbol ß does not have an upper case version and is either converted to “SS” or remains lowercase if accuracy is important. Turkish has countless lowercase lowercase i and uppercase I.
  • For multilingual web applications, think carefully about how to decide which version to show and how to work with it in the URL. The user should always be able to manually select the language, and you want search engines to find different language versions under different URLs.
  • Some East Asian languages ​​(namely, Japanese and Chinese, and maybe others) do not have spaces between words
  • Japanese (maybe others) has separate versions ("full width") of Arabic numerals and spaces, and even two versions of some of its own characters (half-width and full-size katakana).
+3
May 05 '09 at 9:14 p.m.
source share

Yes, this is a massive topic. The right decision is a lot of work.

In my program, I use an integer key for each piece of text and view it in a file as necessary, depending on the language. There are no literal lines in the code, only keys. I define them with an "enumeration" in C ++, so I am not typing a number. I wrote a utility to synchronize various language files when I add more listings and translators fill in the blanks.

Each key also has a related tooltip, image, key combination, etc.

As for the times and dates ... again, this is a lot more complicated than you might think, but doesn't PHP handle this for you? (I don't know, I'm a C ++ guy ...)

+1
Mar 20 '09 at 1:18
source share

PHP represents strings within itself as byte streams and assumes iso-8859-1 for cases where encoding matters. For the most part, you can just use UTF-8 everywhere and everything will be fine. One of them, if your site receives data from its users, is that you can never be 100% sure that they send content in the correct encoding. You can use mb_detect_encoding to validate input, or use a hidden field with "exotic" characters to validate.

Keep in mind that all string-related functions in PHP that are character-based assume character = byte. This means that you cannot trust string functions at all. See this page for more details.

Another good resource for PHP is the Nick Nettleton cheatsheet .

An object that is very closely related to encodings / encodings, collation . You need your comparisons to match the language / culture with which you work. At least in MySql (possibly in other RDBMSs) you can specify sorting at different levels, for example, for each database, at the table, for the column, and even in the query itself.

+1
Mar 23 '09 at 21:57
source share



All Articles