How to work with Unicode replacement character (0xFFFD / 65533) in SQL

I did not even know that the Unicode () replacement symbol existed a week ago. Now I find out that, at least in SQL, there seems to be some kind of special and strange logic. For instance:

select replace(N'bl' + NCHAR(65533) + N'rt', NCHAR(65533), N'X') 

returns bl rt instead of blXrt. A:

 select CHARINDEX(NCHAR(65533), N'b' + NCHAR(65533) + N't') 

returns 0 instead of 2. I'm just trying to determine which rows in the table contain this character, and I cannot find an easy way to do this. The treatment for this character is so weird, there must be more I can find out about it. Where is the behavior defined, or rather, what is the easiest way to find rows in the MS SQL Server database that contain this character?

EDIT For those who are experimenting with answers, I suggest checking your answer for the following data:

 create table Test([Value] nvarchar(100) not null) insert into Test([Value]) values('b' + NCHAR(65533) + 't') insert into Test([Value]) values('b?t') insert into Test([Value]) values('bat') 
+6
source share
1 answer

Krzysztof Kozielczyk wrote that valid Unicode characters must be converted to a binary string for replacement, so this may be the answer to your original question.

 SELECT REPLACE(N'test' + NCHAR(65533) COLLATE Latin1_General_BIN, NCHAR(65533) COLLATE Latin1_General_BIN, '') 

The code above also shows how to find strings with valid Unicode characters, but this is a workaround, not a solution. a source

+7
source

Source: https://habr.com/ru/post/987206/


All Articles