Avoiding Code Changes with Microsoft SQLServer and Unicode

How can you force an MSSQL server to accept Unicode data by default in VARCHAR or NVARCHAR columns?

I know that you can do this by putting N in front of the line that will be placed in the field, but to be honest, it seems a little archaic in 2008 and, in particular, using SQL Server 2005.

+1
source share
4 answers

The N syntax is how you specify a Unicode string literal in SQL Server.

 N'Unicode string' 'ANSI string' 

SQL Server will automatically convert between them whenever possible, using either column sorting or database sorting.

So, if your string literals do not actually contain Unicode characters, you do not need to specify the N prefix.

But if your do string literals contain Unicode characters, then you must use the N prefix.

+4
source

If this is a web application, you can probably force your web server to use UTF8 by default. Thus, all the data back and forth in the browser will be UTF8, which can be inserted into the VARCHAR fields. UTF8 is a great way to make apps that don't know about Unicode with it.

+2
source

They really need a way to disable the N '' prefix. The argument “it is necessary for backward compatibility” makes no sense to me - I’m sure to make this behavior standard for older applications, but give me the ability to include default Unicode strings (i.e., the N 'prefix is ​​not required). I found that I need to go and combine large areas of my application in order to adapt to Unicode on SQL Server when this is NOT a problem in Oracle and Postgresql. Come on Microsoft!

+2
source

While you can simply save the contents of UTF8 in the VARCHAR field on the MSSQL server until the character translation is complete, you should know that:

  • No management / reporting / data tools outside of your application will be able to understand your non-English characters.

  • Language-specific processing, such as sorting a list of names, may not be performed in the order acceptable for each language.

  • Be careful with data truncation. Truncating a multibyte UTF8 character typically results in data corruption for the character involved. You should always reject the entry if it exceeds the length of the field.

  • It may not be as easy as you think to turn off character set translation. Even if you disable it in your client driver, it can still be overridden in some cases if there is a significant difference between the client and the RDBMS code page, which instantly leads to data corruption.

  • If you think that is all, you will have to worry that you are fooling yourself.

In conclusion, although you may be tempted to go this route, this is not a good idea. When passing multiple bytes, a code change is required.

+1
source

Source: https://habr.com/ru/post/905640/


All Articles