The best way to find a similar value from a large table

Question

The best way to find a similar value from a large table

I have a database where I store over 1,000,000 names in mysql. Now the task of my application is a little typical. I am not only looking for names in the database, but also finding similar names. Suppose the name is entered as christian , then the application will show suggested names such as christine , chris , etc. What is the best way to do this without using a like clause. Proposals will concern only changes in the last part of the name.

+6

sql mysql

user794091 Jun 11 '11 at 16:12

source share

6 answers

You can use the php metaphone () function to generate metaphonic code for each name and save them along with the names.

 <?php print "chris" . "\t" . metaphone("chris") . "\n"; print "christian" . "\t" . metaphone("christian") . "\n"; print "christine" . "\t" . metaphone("christine") . "\n"; # prints: # chris XRS # christine XRSTN # christian XRSXN

Then you can use the levenshtein distance algorithm (either in php [http://php.net/manual/en/function.levenshtein.php] or in mysql [http://www.artfulsoftware.com/infotree/queries.php # 552]) to calculate the distance between metacodes. In my test below a distance of 2 or less, it seemed like the level of similarity you are looking for.

 <?php $names = array( array('mike',metaphone('mike')), array('chris',metaphone('chris')), array('chrstian',metaphone('christian')), array('christine',metaphone('christine')), array('michelle',metaphone('chris')), array('mick',metaphone('mick')), array('john',metaphone('john')), array('joseph',metaphone('joseph')) ); foreach ($names as $name) { _compare($name); } function _compare($n) { global $names; $name = $n[0]; $meta = $n[1]; foreach ($names as $cname) { printf("The distance between $name and {$cname[0]} is %d\n", levenshtein($meta, $cname[1])); } }

+2

spuriousdata Jun 11 '11 at 16:48

source share

Like is generally a good solution, but another way to improve performance for this is to create a partial column index and then send queries with the same length as your prefix. See the MySQL documentation for col_name(length) .

+1

glortho Jun 11 '11 at 16:24

source share

You can use regular output, which I think. I do not want to do this, but there is a REGEXP function that you can enter in the WHERE clause. Look here

0

Nicola Peluchetti Jun 11 '11 at 16:23

source share

You can use SOUNDS LIKE, I think it should be pretty fast too.

http://dev.mysql.com/doc/refman/5.0/en/string-functions.html#operator_sounds-like

0

Cem kalyoncu Jun 11 '11 at 16:30

source share

Using LIKE, where the left side is fixed, will not require a table scan. I assume that is why you do not want to use LIKE: SELECT * FROM table WHERE name LIKE CONCAT(?, "%") quickly and does not require a table scan to search for rows. CONCAT allows you to use prepared queries with the syntax%.

You can also do something like:

SELECT * from table WHERE name < 'christian' LIMIT 20

and

SELECT * FROM table WHERE name > 'christian' LIMIT 20

to find neighbors in a sorted list.

0

Joshua martell Jun 11 '11 at 16:34

source share

flori · Accepted Answer · 2011-06-11T16:24:48+0000

If you also want similar names (by sound), then something like SOUNDEX() could help: http://dev.mysql.com/doc/refman/5.0/en/string-functions.html#function_soundex

Otherwise … LIKE 'chri%' seems like a bad idea to me?

If you really only need the first characters without LIKE , you can use SUBSTRING() .

The best way to find a similar value from a large table

More articles: