Mysql accent insensitive and dotted insensitive search

Problem . I am trying to implement a search algorithm that shows results even if dotted characters are specified. In other words: SELECT 'über' = 'uber' or SELECT 'mas' = 'maş' these results will return true. This is applicable for each char in the following array:

 $arr = array('ş' => 's', 'ç' => 'c', 'ö' => 'o', 'ü' => 'u' and so on ...); 

The decision is in my mind . Along with the source column, I may have a specific column in which English names are stored. Therefore, before storing "über" in the database, I will also convert it to "uber" in php, and then save both "über" (as the original) and "uber" (as the search) in the database.

But then, although I searched this all day, I still think that there should be a simpler and more understandable way to complete the task, since this would mean (more or less) storing the same data twice in the database. So guys, do you think a solution is the only way to go or do you know a better approach?


For accent insensitive, I saw posts on SO, they work, but since I also consider dotted characters, I had to ask this question.


I can’t publish the whole table structure and code just for some reason, but I will give an example.

 myusers | CREATE TABLE `myusers` ( id int auto_increment not null primary key, email varchar(100) COLLATE latin1_general_ci not null, fullname varchar(75) COLLATE latin1_general_ci not null) PRIMARY KEY('id') ) ENGINE=MyISAM AUTO_INCREMENET=2 DEFAULT CHARSET=latin1 COLLATE latin1_general_ci | 

The above table structure. Here are inserted and selected:

 INSERT INTO myusers (fullname) VALUES ('Agüeda'); INSERT INTO myusers (fullname) VALUES ('Agueda'); SELECT * FROM myusers WHERE fullname = 'Agüeda' COLLATE latin1_general_ci +----+-------+----------+ | id | email | fullname | +----+-------+----------+ | 1 | | Agüeda | +----+-------+----------+ 1 row in set (0.00 sec) SELECT * FROM myusers WHERE fullname = 'agueda' COLLATE latin1_general_ci +----+-------+----------+ | id | email | fullname | +----+-------+----------+ | 2 | | Agueda | +----+-------+----------+ 1 row in set (0.00 sec) 

Well, the desired result, obviously, when agueda is ransacked as "Agueda" and "Agüeda" returns, but that is not so. As I mentioned above, I created a new column and saved all the name in English and also performed a search. But still, it cost me twice (because I'm also looking from the source columns, which are higher in the search results). There must be a better way ...

source share
4 answers

1) Write your own sort. latin1_general_diacriticinsensitive. I don’t even know where to start, though :).

2) Use regular expressions and character groups: / [uü] ber /

3) The decision is in your mind. I would personally use this, since design is all about compromise, and it is a simple solution with 100% overhead. Of course, the overhead of space can eventually turn into the speed of overhead, especially with MySQL, but worry about this later. It is also very easy to undo if required.


Just use the appropriate sorting. For instance:

 create table test( foo text ) collate = utf8_unicode_ci; insert into test values('Agüeda'); insert into test values('Agueda'); select * from test where foo = 'Agueda'; 

This gives two lines.


Well, instead of trying to replace them and run a search in x-times, I would suggest using the mysql function "LIKE", that is, "SELECT * FROM x WHERE search LIKE"% ber. Where you need to replace diacritics with "%".

EDIT: my error "%" replaces any number of characters. Use "_" for one char.


Take a look at this post: / ...

He just has the opposite problem that you are facing. See the WHERE clause in the selected answer. You can probably just use the _ci suffix and it will work.

Let us know how it is allowed.



All Articles