You can translite words into Latin characters and use a phonetic algorithm such as Soundex to get the essence from your word and compare it to the ones you have. In your case, it will be C252 for all your words except the last one, which is C250 .
Edit The problem with comparative functions such as levenshtein or similar_text is that you need to call them for each pair of input values and a possible match. This means that if you have a database with 1 million records, you will need to call these functions 1 million times.
But features like soundex or metaphone that calculate some digest can help reduce the number of actual comparisons. If you save a soundex or metaphone for every known word in your database, you can very quickly reduce the number of possible matches. Later, when the set of possible match values decreases, you can use the comparative functions to get the best match.
Here is an example:
// building the index that represents your database $knownWords = array('Čakánka', 'Cakaka'); $index = array(); foreach ($knownWords as $key => $word) { $code = soundex(iconv('utf-8', 'us-ascii//TRANSLIT', $word)); if (!isset($index[$code])) { $index[$code] = array(); } $index[$code][] = $key; } // test words $testWords = array('cakanka', 'cákanká', 'ČaKaNKA', 'CAKANKA', 'CAAKNKA', 'CKAANKA', 'cakakNa'); echo '<ul>'; foreach ($testWords as $word) { $code = soundex(iconv('utf-8', 'us-ascii//TRANSLIT', $word)); if (isset($index[$code])) { echo '<li> '.$word.' is similar to: '; $matches = array(); foreach ($index[$code] as $key) { similar_text(strtolower($word), strtolower($knownWords[$key]), $percentage); $matches[$knownWords[$key]] = $percentage; } arsort($matches); echo '<ul>'; foreach ($matches as $match => $percentage) { echo '<li>'.$match.' ('.$percentage.'%)</li>'; } echo '</ul></li>'; } else { echo '<li>no match found for '.$word.'</li>'; } } echo '</ul>';
Gumbo source share