Mysql accent insensitive and dotted insensitive search

Problem . I am trying to implement a search algorithm that shows results even if dotted characters are specified. In other words: SELECT 'über' = 'uber' or SELECT 'mas' = 'maş' these results will return true. This is applicable for each char in the following array:

 $arr = array('ş' => 's', 'ç' => 'c', 'ö' => 'o', 'ü' => 'u' and so on ...); 

The decision is in my mind . Along with the source column, I may have a specific column in which English names are stored. Therefore, before storing "über" in the database, I will also convert it to "uber" in php, and then save both "über" (as the original) and "uber" (as the search) in the database.

But then, although I searched this all day, I still think that there should be a simpler and more understandable way to complete the task, since this would mean (more or less) storing the same data twice in the database. So guys, do you think a solution is the only way to go or do you know a better approach?

EDIT

For accent insensitive, I saw posts on SO, they work, but since I also consider dotted characters, I had to ask this question.

EDIT2

I can’t publish the whole table structure and code just for some reason, but I will give an example.

 myusers | CREATE TABLE `myusers` ( id int auto_increment not null primary key, email varchar(100) COLLATE latin1_general_ci not null, fullname varchar(75) COLLATE latin1_general_ci not null) PRIMARY KEY('id') ) ENGINE=MyISAM AUTO_INCREMENET=2 DEFAULT CHARSET=latin1 COLLATE latin1_general_ci | 

The above table structure. Here are inserted and selected:

 INSERT INTO myusers (fullname) VALUES ('Agüeda'); INSERT INTO myusers (fullname) VALUES ('Agueda'); SELECT * FROM myusers WHERE fullname = 'Agüeda' COLLATE latin1_general_ci +----+-------+----------+ | id | email | fullname | +----+-------+----------+ | 1 | | Agüeda | +----+-------+----------+ 1 row in set (0.00 sec) SELECT * FROM myusers WHERE fullname = 'agueda' COLLATE latin1_general_ci +----+-------+----------+ | id | email | fullname | +----+-------+----------+ | 2 | | Agueda | +----+-------+----------+ 1 row in set (0.00 sec) 

Well, the desired result, obviously, when agueda is ransacked as "Agueda" and "Agüeda" returns, but that is not so. As I mentioned above, I created a new column and saved all the name in English and also performed a search. But still, it cost me twice (because I'm also looking from the source columns, which are higher in the search results). There must be a better way ...

+6
source share
4 answers

1) Write your own sort. latin1_general_diacriticinsensitive. I don’t even know where to start, though :).

2) Use regular expressions and character groups: / [uü] ber /

3) The decision is in your mind. I would personally use this, since design is all about compromise, and it is a simple solution with 100% overhead. Of course, the overhead of space can eventually turn into the speed of overhead, especially with MySQL, but worry about this later. It is also very easy to undo if required.

+1
source

Just use the appropriate sorting. For instance:

 create table test( foo text ) collate = utf8_unicode_ci; insert into test values('Agüeda'); insert into test values('Agueda'); select * from test where foo = 'Agueda'; 

This gives two lines.

+2
source

Well, instead of trying to replace them and run a search in x-times, I would suggest using the mysql function "LIKE", that is, "SELECT * FROM x WHERE search LIKE"% ber. Where you need to replace diacritics with "%".

EDIT: my error "%" replaces any number of characters. Use "_" for one char.

0
source

Take a look at this post: fooobar.com/questions/109101 / ...

He just has the opposite problem that you are facing. See the WHERE clause in the selected answer. You can probably just use the _ci suffix and it will work.

Let us know how it is allowed.

0
source

Source: https://habr.com/ru/post/898933/


All Articles