Oracle set NLS_LANG by default

I am running an Oracle 11g database on RHEL6. If the NLS_LANG client is not installed, the length of some utf8 character is 2. After setting NLS_LANG = AMERICAN_AMERICA.UTF8, the length is 1. How can I change the default value for NLG_LANG for the entire database? I do not want to change this value only for the session or as an enviromental variable in linux. Is it possible to install it for the database?

SQL> select length('á') from dual; LENGTH('??') ------------ 2 # export NLS_LANG=AMERICAN_AMERICA.UTF8 SQL> select length('á') from dual; LENGTH('á') ----------- 1 

Thanks so much for any idea.

+2
source share
2 answers

This is what probably happens:

  • Your client character set should be something like CP1252 or ISO-8859-15, while in fact your client really uses UTF8.
  • In this character set (UTF8), the á character occupies two bytes, so your client sends these two bytes, telling Oracle that they belong to CP1252. CP1252 has two byte codes for two characters, which causes the database to interpret the input as two characters, so length('à') is 2 (and if you insert this line, the result of the insert is not equal to à )
  • When you have correctly configured the character set, the input is correctly processed by the oracle as one character, and its length is 1 (two more bytes).

Conclusion: set your client character correctly or you will receive translation errors (you will not get illegal characters this way, but you can get strange characters ( ¿ ).

The database character set is set at creation time and is usually changed using export / create empty database / import.

+1
source

Since the database character set is AL32UTF8, when you set the client NLS_LANG to UTF8, you tell Oracle to bypass the character set conversion that usually occurs when data is transferred between the client and server. This is extremely dangerous because it means that if a client sends data in any other encoding, there is a significant risk that invalid data will be stored in the database. In this case, a LENGTH call that returns 1 sends the incorrectly encoded data to the database. Most likely, the client operating system presents data using something like the ISO-8859-1 character set, where a is a single-byte character (binary value 0xE1). When the client sends data to the database, it sends 0xE1 and tells the database "trust me, this is valid UTF-8 data." The database does not bother to check binary data to notice that 0xE1 is not a valid binary representation of any character in the UTF-8 character set. But now, if some other client comes in, it is correctly configured and requests the translation of the character set, and the database has 0xE1 stored in some column, the character set conversion code will be run, determine that 0xE1 is not a valid UTF-8 character and return a replacement character (usually "?"), rather than a "properly configured client."

Since the character set of the database is UTF-8, you expect the character "á" to have a length of 2. UTF-8 is a character set of variable width. The characters that are part of the US7ASCII character set are represented by 1 byte, most Western European characters, such as "á", are represented using 2 bytes, and most Asian characters are represented using 3 bytes. There are a few rare characters that require 4 bytes.

Your NLS_LANG must be set according to the character set that your client system supports. It is rare that you will have a client system that supports UTF-8. Since your client parameter overrides everything that is set at the database level, this means that each client must be configured to set the appropriate environment variable. The NLS_LANG FAQ has a section on how to determine the correct NLS_LANG parameter for a Unix client .

+1
source

Source: https://habr.com/ru/post/1499272/


All Articles