How to convert a column to ASCII on the fly without saving to check for matches with an external ASCII string?

I have a member search function where you can give parts of names, and the return should consist of all members that have at least one username, first name or last name matching this entry. The problem here is that some names have “weird” characters, such as é in Renée , and the user does not want to enter a weird character, but a regular substitute for ASCII e .

In PHP, I convert the input string to ASCII using iconv (just in case someone prints weird characters). However, in the database, I also have to convert weird characters to ASCII (obviously) to match strings.

I tried the following:

 SELECT CONVERT(_latin1'Renée' USING ascii) t1, CAST(_latin1'Renée' AS CHAR CHARACTER SET ASCII) t2; 

(These are two attempts.) Both do not work. Both have Ren?e as a conclusion. Question icon must be e . This is normal if it outputs Ren?ee , since I can simply remove all question marks after the conversion.

As you can imagine, the columns I want to query are encoded in Latin1.

Thanks.

+3
source share
4 answers

You do not need to convert anything. Your requirement is to compare the two lines and ask if they are equal, ignoring the accents; the database server can use collation to do this for you:

Non-UCA sorts have a one-to-one mapping from character code to weight. In MySQL, such comparisons are case insensitive and accent insensitive. utf8_general_ci - example: 'a', "A", "A" and "each" have different character codes, but all have a weight of 0x0041 and compare them as equal.

 mysql> SET NAMES 'utf8' COLLATE 'utf8_general_ci'; Query OK, 0 rows affected (0.00 sec) mysql> SELECT 'a' = 'A', 'a' = 'À', 'a' = 'á'; +-----------+-----------+-----------+ | 'a' = 'A' | 'a' = 'À' | 'a' = 'á' | +-----------+-----------+-----------+ | 1 | 1 | 1 | +-----------+-----------+-----------+ 1 row in set (0.06 sec) 
+6
source

First of all, it should work as follows:

 SELECT * FROM `test` WHERE `name` COLLATE utf8_general_ci LIKE '%renee%'; 

Where is the test table:

 +-----+--------+ | id | name | +-----+--------+ | 1 | Renée | | 2 | Renêe | | 3 | Renee | +-----+--------+ 

What is your version of MySQL and how are you trying to map things?


One of the other possible solutions is transliteration .

Related: PHP Transliteration

Transliterating input should not be a problem, but transliterating values ​​from persistent storage (e.g. db) in real time during a search may not be possible. So you can add three more fields: username_slug , firstname_slug and lastname_slug . When inserting / changing a record, set the slip values ​​accordingly. And when searching, search for transliterated input with these fields.

 +------+----------+---------------+----------+---------------+ ... | id | username | username_slug | lastname | lastname_slug | ... +------+----------+---------------+----------+---------------+ ... | 1 | Renée | renee | La Niña | la-nina | ... | 2 | Renêe | renee | ... | ... | ... | 3 | Renee | renee | ... | ... | ... +------+----------+---------------+----------+---------------+ ... 

A search for "renee" or "renèe" will match all entries.

As a side effect, you can use these fields to create SEF links (search engine), so they are called ..._slug , for example. example.com/users/renee. Of course, in this case you should check the uniqueness of the drain field.

+4
source

The CAST() operator in the context of character encoding is converted from one method of storing characters to another - it does not change the actual characters, but this is what you are after. The symbol é is that it is in any character set, it is not e. You need to convert characters with an accent to characters without an accent, which is another problem and was asked several times earlier ( normalization of accented characters in MySQL queries ).

I am not sure if there is a way to do this directly in MySQL without having a translation table and passing letter. Most likely, it would be easier to write a PHP script to go through the database and make translations.

+3
source

@vincebowdren answer above works, I just add this as an answer for formatting:

 CREATE TABLE `members` ( `id` int(11) DEFAULT NULL, `lastname` varchar(20) COLLATE utf8_unicode_ci DEFAULT NULL ); insert into members values (1, 'test6ë'); select id from members where lastname like 'test6e%'; 

Productivity

  + ------ +
 |  id |
 + ------ +
 |  1 |
 + ------ + 

And using Latin1,

 set names latin1; CREATE TABLE `members2` ( `id` int(11) DEFAULT NULL, `lastname` varchar(20) CHARACTER SET latin1 DEFAULT NULL ); insert into members2 values (1, 'Renée'); select id from members2 where lastname like '%Renee%'; 

will give:

  + ------ +
 |  id |
 + ------ +
 |  1 |
 + ------ + 

Of course, the OP must have the same encoding in the application (PHP), the connection (MySQL on Linux is used by default for latin1 in 5.0, but the default is UTF8 in version 5.1), and the data type in the field has fewer unknowns. Collages take care about everything else.

EDIT: I wrote should to better control everything, but it also works:

 set names latin1; select id from members where lastname like 'test6ë%'; 

Because once the charset is installed, MySQL does an internal conversion. In this case, it somehow converts and compares the UTF8 string (from the database) with latin1 (from the request).

EDIT 2: Some skeptics require me to provide an even more convincing example:

Given the above statements, here's what I've done more. Make sure the terminal is in UTF8.

 set names utf8; insert into members values (5, 'Renée'), (6, 'Renêe'), (7, 'Renèe'); select members.id, members.lastname, members2.id, members2.lastname from members inner join members2 using (lastname); 

Remember that members is in utf8 and members2 is in latin1.

  + ------ + ---------- + ------ + ---------- +
 |  id |  lastname |  id |  lastname |
 + ------ + ---------- + ------ + ---------- +
 |  5 |  Renée |  1 |  Renée |
 |  6 |  Renêe |  1 |  Renée |
 |  7 |  Renèe |  1 |  Renée |
 + ------ + ---------- + ------ + ---------- + 

which proves with the right settings, sorting does the job for you.

+3
source

Source: https://habr.com/ru/post/1485256/


All Articles