Unicode characters causing SQL Server 2005 string comparison problems

This request:

select * from op.tag where tag = 'fussball' 

Returns the result that has the value of the "fußball" tag column. The tag column is defined as nvarchar (150).

Although I understand that they are similar words grammatically , can someone explain and protect this behavior? I believe this is due to the same sorting settings that allow you to change the case sensitivity of a column / table, but who would like this behavior? A unique constraint on a column also fails when inserting one value when another exists because of a constraint violation. How to disable this?

The next question is about the bonus point. Explain why this query does not return any rows:

 select 1 where 'fußball' = 'fussball' 

Bonus question (answer?): @ScottCher personally pointed out to me that this is due to the fact that the string literal "fussball" is considered varchar. This query returns the result:

 select 1 where 'fußball' = cast('fussball' as nvarchar) 

But again, this does not happen:

 select 1 where cast('fußball' as varchar) = cast('fussball' as varchar) 

I'm confused.

+4
source share
5 answers

I assume the Unicode code set for your connection / table / database indicates ss == ß. The latter behavior would be due to the fact that it works with an erroneous quick path or perhaps performs a binary comparison, or maybe you do not pass to ß in the correct encoding (I agree stupidly).

http://unicode.org/reports/tr10/#Searching mentions that U + 00DF has a special cover. Here is an insightful passage:

Language-sensitive searches and comparisons are closely related to reconciliation. Lines that are compared as equal at a certain level of strength are those that must be matched when performing language matching. For example, for primary strength, “ß” will match “ss” according to UCA and “aa” will match “å” in Danish tailoring by UCA.

+3
source

SELECT returns a string with sorting Latin1_General_CI_AS (SQL2000).

This is not with the Latin1_General_BIN mapping.

You can designate a table column sort by using the COLLATE <match> keyword after N / VARCHAR.

You can also compare strings with specific sorting using syntax

 string1 = string2 COLLATE < collation > 
+1
source

Some supporting answers are not a complete question to your question, but may be useful:

If you try:

 SELECT 1 WHERE N'fußball' = N'fussball' 

you get "1" - when using the "N" character to denote Unicode, two lines are considered the same - why in this case I do not know yet.

To find the default collation for the server, use

 SELECT SERVERPROPERTY('Collation') 

To find the mapping of a given column in the database, use this query:

 SELECT name 'Column Name', OBJECT_NAME(object_id) 'Table Name', collation_name FROM sys.columns WHERE object_ID = object_ID('your-table-name') AND name = 'your-column-name' 
+1
source

This is not an answer that explains the behavior, but may be relevant:

In this question, I found out that using sorting

 Latin1_General_Bin 

will avoid most sorting features.

+1
source

Bonus question (answer?): @ScottCher pointed out to me that this caused by the string literal "fussball" is seen as a cook. This query returns the result:

select 1 where 'fußball' = cast('fussball' as nvarchar)

Here you are dealing with SQL Server type data priority rules, as specified in Data Type Priority . Comparison is always performed using an older type:

When an operator combines two expressions of different data types, the rules for data type precedence specify that a data type with a lower priority is converted to a data type with a higher priority.

Since nvarchar has a higher priority than varchar, the comparison in your example will be in accordance with the nvarchar type, so it is really exactly the same as select 1 where N'fußball' =N'fussball' (i.e. using Unicode types). Hope this also makes it clear why your last case does not return a single row.

+1
source

Source: https://habr.com/ru/post/1301482/


All Articles