The Arabic name search discards the differences between "أ", "ا" in mysql

I save the Arabic name in my database. In Arabic, there are letters that can be written in different formats, such as "ا", "أ", "آ", they all represent the same letter. In addition, "ه", "ة".

I need to search the database for names and ignore the differences between "ا", "أ", "آ", as well as the differences between "ه", "ة".

So, for example, when a user enters "اسامة" in the search field, he should return "أسامة", "اسامة", "أسامه", "اسامه", etc. Another example: "فايز", "فائز" should return both.

How can I do this using mysql query? How can I search for similar names with the same name?

I tried as a Like keyword, but it does not work.

select * from employee WHERE fname like "%أسامة%" and mname="علي" and lname="الجاسم"
+4
source share
2 answers

The way I deal with this is to normalize the data that you store in your database. Create a new field in your database and run a script that normalizes the names and saves the normalized version in the new field. So, "أسامة", "اسامة", "أسامه", "اسامه" will be saved in a normalized field, such as اسامه, and you will run your queries in a normalized field, and not in a raw name field.

+2
source

. , . , - .

3 :

, MySQL. .

1.

, MySQL . Index.xml, . , information_schema :

SHOW VARIABLES LIKE 'character_sets_dir';

, , <charset name="utf8″>, xml:

<charset name="utf8">
.
.
.
  <collation name="utf8_arabic_ci" id="1029">
   <rules>
     <reset>\u0627</reset> <!-- Alef 'ا' -->
     <i>\u0623</i>        <!-- Alef With Hamza Above 'أ' -->
     <i>\u0625</i>        <!-- Alef With Hamza Below 'إ' -->
     <i>\u0622</i>        <!-- Alef With Madda Above 'آ' -->
   </rules>
   <rules>
     <reset>\u0629</reset> <!-- Teh Marbuta 'ة' -->
     <i>\u0647</i>        <!-- Heh 'ه' -->
   </rules>
   <rules>
     <reset>\u0000</reset> <!-- Ignore Tashkil -->
     <i>\u064E</i>        <!-- Fatha 'َ' -->
     <i>\u064F</i>        <!-- Damma 'ُ' -->
     <i>\u0650</i>        <!-- Kasra 'ِ' -->
     <i>\u0651</i>        <!-- Shadda 'ّ' -->
     <i>\u064F</i>        <!-- Sukun 'ْ' -->
     <i>\u064B</i>        <!-- Fathatan 'ً' -->
     <i>\u064C</i>        <!-- Dammatan 'ٌ' -->
     <i>\u064D</i>        <!-- Kasratan 'ٍ' -->
   </rules>
 </collation>
</charset>

xml , utf8, utf8_arabic_ci id 1029, 1024-2047. , MySQL , . , . MySQL documentemntation .

MySQL :

ALTER TABLE persons MODIFY name VARCHAR(50) 
CHARACTER SET 'utf8' COLLATE 'utf8_arabic_ci';

"اسامة" "اسامة", "أسامه", "أسامة" ..

2.

. "", :

id normalized_name name
1  احمد            احمد
2  أحمد            احمد
3  أسامه          اسامة
4  أسامة          اسامة
5  اسامه          اسامة
6  اسَامه          اسامة

, "", , . , , . - :

SELECT name FROM persons WHERE normalized_name = "اسامة";

+--------------+
| name         |
+--------------+
| أسامه        |
| أسامة        |
| اسامه        |
| اسَامه        |
+--------------+

3.

, , , . .

REGEX RLIKE MySQL. , "أحمد" Alef, , :

SELECT name FROM clients WHERE name REGEXP 'ا|أ|إ]حمد]'

, , , . , , :

// Add all your patterns and replacement in these arrays
$patterns     = array( "/(ا|أ|آ)/", "/(ه|ة)/" ); 
$replacements = array( "[ا|أ|آ]",   "[ة|ه]" );   
$query_string = preg_replace($patterns, $replacements, $search_string);

, , .

, . , , , (, , ), , .

0

Source: https://habr.com/ru/post/1674935/


All Articles