Perform string comparing without diacritics

I am trying to search in Arabic text on SQL Server and ignore Arabic diacritics. Therefore, I use sorting Arabic_100_CI_AI. but it does not work.

For example, for the following query, I should get 1, but it has no result!

select 1 
 where (N'مُحَمَّد'  Collate Arabic_100_CI_AI) = (N'محمّد' Collate Arabic_100_CI_AI)

What is the problem and how can I make a diacritical insensitive comparison in the Arabic text ?

+4
source share
2 answers

The flag seems to AINOT work in Arabic. You can create your own Unicode normalization function.

ALTER FUNCTION [dbo].[NormalizeUnicode] 
(
    -- Add the parameters for the function here
    @unicodeWord nvarchar(max)
)
RETURNS nvarchar(max)
AS
BEGIN
    -- Declare the return variable here
    DECLARE @Result nvarchar(max)

    -- Add the T-SQL statements to compute the return value here    
    declare @l int;
    declare @i int;

    SET @l = len(@unicodeWord + '-') - 1
    SET @i = 1;
    SET @Result = '';
    WHILE (@i <= @l)
    BEGIN
        DECLARE @c nvarchar(1);
        SET @c = SUBSTRING(@unicodeWord, @i, 1);
        -- 0x064B to 0x65F, 0x0670 are Combining Characters
        -- You may need to perform tests for this character range
        IF NOT (unicode(@c) BETWEEN 0x064B AND 0x065F or unicode(@c) = 0x0670)
            SET @Result = @Result + @c;
        SET @i = @i + 1;
    END

    -- Return the result of the function
    RETURN @Result
END

The next test should work correctly,

select  1
where   dbo.NormalizeUnicode(N'بِسمِ اللہِ الرَّحمٰنِ الرَّحیم') = dbo.NormalizeUnicode(N'بسم اللہ الرحمن الرحیم');

Notes:

+3

, ( ), , , , ( )

N 'مح َ م َّ د' N 'محمد'

, unicode, unicode(); .

,

select 1 
 where N'مُحَمَّد'  Collate Arabic_100_CI_AI like '%%'

+1

Source: https://habr.com/ru/post/1540965/


All Articles