Replacing accented php characters

I am trying to replace accented characters with a regular replacement. The following is what I am doing now.

$string = "Éric Cantona"; $strict = strtolower($string); echo "After Lower: ".$strict; $patterns[0] = '/[á|â|à|å|ä]/'; $patterns[1] = '/[ð|é|ê|è|ë]/'; $patterns[2] = '/[í|î|ì|ï]/'; $patterns[3] = '/[ó|ô|ò|ø|õ|ö]/'; $patterns[4] = '/[ú|û|ù|ü]/'; $patterns[5] = '/æ/'; $patterns[6] = '/ç/'; $patterns[7] = '/ß/'; $replacements[0] = 'a'; $replacements[1] = 'e'; $replacements[2] = 'i'; $replacements[3] = 'o'; $replacements[4] = 'u'; $replacements[5] = 'ae'; $replacements[6] = 'c'; $replacements[7] = 'ss'; $strict = preg_replace($patterns, $replacements, $strict); echo "Final: ".$strict;

This gives me:

  After Lower: éric cantona Final: ric cantona

The above gives me ric cantona I want the result to be eric cantona .

Can someone help me with where I'm wrong?

+73

string php preg-replace non-ascii-characters

Lizard Jul 30 '10 at 13:07

source share

15 answers

I tried all kinds based on the options listed in the answers, but the following worked:

 $unwanted_array = array( 'Š'=>'S', 'š'=>'s', 'Ž'=>'Z', 'ž'=>'z', 'À'=>'A', 'Á'=>'A', 'Â'=>'A', 'Ã'=>'A', 'Ä'=>'A', 'Å'=>'A', 'Æ'=>'A', 'Ç'=>'C', 'È'=>'E', 'É'=>'E', 'Ê'=>'E', 'Ë'=>'E', 'Ì'=>'I', 'Í'=>'I', 'Î'=>'I', 'Ï'=>'I', 'Ñ'=>'N', 'Ò'=>'O', 'Ó'=>'O', 'Ô'=>'O', 'Õ'=>'O', 'Ö'=>'O', 'Ø'=>'O', 'Ù'=>'U', 'Ú'=>'U', 'Û'=>'U', 'Ü'=>'U', 'Ý'=>'Y', 'Þ'=>'B', 'ß'=>'Ss', 'à'=>'a', 'á'=>'a', 'â'=>'a', 'ã'=>'a', 'ä'=>'a', 'å'=>'a', 'æ'=>'a', 'ç'=>'c', 'è'=>'e', 'é'=>'e', 'ê'=>'e', 'ë'=>'e', 'ì'=>'i', 'í'=>'i', 'î'=>'i', 'ï'=>'i', 'ð'=>'o', 'ñ'=>'n', 'ò'=>'o', 'ó'=>'o', 'ô'=>'o', 'õ'=>'o', 'ö'=>'o', 'ø'=>'o', 'ù'=>'u', 'ú'=>'u', 'û'=>'u', 'ý'=>'y', 'þ'=>'b', 'ÿ'=>'y' ); $str = strtr( $str, $unwanted_array );

+151

Lizard Jul 30 '10 at 16:15

source share

To remove diacritics, use the icon:

 $val = iconv('ISO-8859-1','ASCII//TRANSLIT',$val);

 $val = iconv('UTF-8','ASCII//TRANSLIT',$val);

Note that php has some kind of strange error in that it (sometimes?) Must have a locale set to do these conversions using setlocale ().

edit just tested, it gets most of your diacritics out of the box:

 $val = "á|â|à|å|ä ð|é|ê|è|ë í|î|ì|ï ó|ô|ò|ø|õ|ö ú|û|ù|ü æ ç ß abc ABC 123"; echo iconv('UTF-8','ASCII//TRANSLIT',$val);

exit:

 a|a|a|a|a ?|e|e|e|ei|i|i|io|o|o|?|o|ou|u|u|u ae c ss abc ABC 123

So, you can manually fix these two odd ones before calling iconv, or delve into the inner workings of php and actually fix it.

+74

mvds Jul 30 '10 at 13:17

source share

I just got a response from Lizard, which is very useful - especially when you do some sorting. Isn't it beautiful how many characters we need to say basically the same;)

If anyone else is looking for a universal solution (as far as indicated above), here is copy & paste:

 /** * Replace language-specific characters by ASCII-equivalents. * @param string $s * @return string */ public static function normalizeChars($s) { $replace = array( ''=>'-', ''=>'-', ''=>'-', ''=>'-', 'Ă'=>'A', 'Ą'=>'A', 'À'=>'A', 'Ã'=>'A', 'Á'=>'A', 'Æ'=>'A', 'Â'=>'A', 'Å'=>'A', 'Ä'=>'Ae', 'Þ'=>'B', 'Ć'=>'C', 'ץ'=>'C', 'Ç'=>'C', 'È'=>'E', 'Ę'=>'E', 'É'=>'E', 'Ë'=>'E', 'Ê'=>'E', 'Ğ'=>'G', 'İ'=>'I', 'Ï'=>'I', 'Î'=>'I', 'Í'=>'I', 'Ì'=>'I', 'Ł'=>'L', 'Ñ'=>'N', 'Ń'=>'N', 'Ø'=>'O', 'Ó'=>'O', 'Ò'=>'O', 'Ô'=>'O', 'Õ'=>'O', 'Ö'=>'Oe', 'Ş'=>'S', 'Ś'=>'S', 'Ș'=>'S', 'Š'=>'S', 'Ț'=>'T', 'Ù'=>'U', 'Û'=>'U', 'Ú'=>'U', 'Ü'=>'Ue', 'Ý'=>'Y', 'Ź'=>'Z', 'Ž'=>'Z', 'Ż'=>'Z', 'â'=>'a', 'ǎ'=>'a', 'ą'=>'a', 'á'=>'a', 'ă'=>'a', 'ã'=>'a', 'Ǎ'=>'a', ''=>'a', ''=>'a', 'å'=>'a', 'à'=>'a', 'א'=>'a', 'Ǻ'=>'a', 'Ā'=>'a', 'ǻ'=>'a', 'ā'=>'a', 'ä'=>'ae', 'æ'=>'ae', 'Ǽ'=>'ae', 'ǽ'=>'ae', ''=>'b', 'ב'=>'b', ''=>'b', 'þ'=>'b', 'ĉ'=>'c', 'Ĉ'=>'c', 'Ċ'=>'c', 'ć'=>'c', 'ç'=>'c', ''=>'c', 'צ'=>'c', 'ċ'=>'c', ''=>'c', 'Č'=>'c', 'č'=>'c', ''=>'ch', ''=>'ch', 'ד'=>'d', 'ď'=>'d', 'Đ'=>'d', 'Ď'=>'d', 'đ'=>'d', ''=>'d', ''=>'D', 'ð'=>'d', 'є'=>'e', 'ע'=>'e', ''=>'e', ''=>'e', 'Ə'=>'e', 'ę'=>'e', 'ĕ'=>'e', 'ē'=>'e', 'Ē'=>'e', 'Ė'=>'e', 'ė'=>'e', 'ě'=>'e', 'Ě'=>'e', 'Є'=>'e', 'Ĕ'=>'e', 'ê'=>'e', 'ə'=>'e', 'è'=>'e', 'ë'=>'e', 'é'=>'e', ''=>'f', 'ƒ'=>'f', ''=>'f', 'ġ'=>'g', 'Ģ'=>'g', 'Ġ'=>'g', 'Ĝ'=>'g', ''=>'g', ''=>'g', 'ĝ'=>'g', 'ğ'=>'g', 'ג'=>'g', 'Ґ'=>'g', 'ґ'=>'g', 'ģ'=>'g', 'ח'=>'h', 'ħ'=>'h', ''=>'h', 'Ħ'=>'h', 'Ĥ'=>'h', 'ĥ'=>'h', ''=>'h', 'ה'=>'h', 'î'=>'i', 'ï'=>'i', 'í'=>'i', 'ì'=>'i', 'į'=>'i', 'ĭ'=>'i', 'ı'=>'i', 'Ĭ'=>'i', ''=>'i', 'ĩ'=>'i', 'ǐ'=>'i', 'Ĩ'=>'i', 'Ǐ'=>'i', ''=>'i', 'Į'=>'i', 'י'=>'i', 'Ї'=>'i', 'Ī'=>'i', 'І'=>'i', 'ї'=>'i', 'і'=>'i', 'ī'=>'i', 'ĳ'=>'ij', 'Ĳ'=>'ij', ''=>'j', ''=>'j', 'Ĵ'=>'j', 'ĵ'=>'j', ''=>'ja', ''=>'ja', ''=>'je', ''=>'je', ''=>'jo', ''=>'jo', ''=>'ju', ''=>'ju', 'ĸ'=>'k', 'כ'=>'k', 'Ķ'=>'k', ''=>'k', ''=>'k', 'ķ'=>'k', 'ך'=>'k', 'Ŀ'=>'l', 'ŀ'=>'l', ''=>'l', 'ł'=>'l', 'ļ'=>'l', 'ĺ'=>'l', 'Ĺ'=>'l', 'Ļ'=>'l', ''=>'l', 'Ľ'=>'l', 'ľ'=>'l', 'ל'=>'l', 'מ'=>'m', ''=>'m', 'ם'=>'m', ''=>'m', 'ñ'=>'n', ''=>'n', 'Ņ'=>'n', 'ן'=>'n', 'ŋ'=>'n', 'נ'=>'n', ''=>'n', 'ń'=>'n', 'Ŋ'=>'n', 'ņ'=>'n', 'ŉ'=>'n', 'Ň'=>'n', 'ň'=>'n', ''=>'o', ''=>'o', 'ő'=>'o', 'õ'=>'o', 'ô'=>'o', 'Ő'=>'o', 'ŏ'=>'o', 'Ŏ'=>'o', 'Ō'=>'o', 'ō'=>'o', 'ø'=>'o', 'ǿ'=>'o', 'ǒ'=>'o', 'ò'=>'o', 'Ǿ'=>'o', 'Ǒ'=>'o', 'ơ'=>'o', 'ó'=>'o', 'Ơ'=>'o', 'œ'=>'oe', 'Œ'=>'oe', 'ö'=>'oe', 'פ'=>'p', 'ף'=>'p', ''=>'p', ''=>'p', 'ק'=>'q', 'ŕ'=>'r', 'ř'=>'r', 'Ř'=>'r', 'ŗ'=>'r', 'Ŗ'=>'r', 'ר'=>'r', 'Ŕ'=>'r', ''=>'r', ''=>'r', 'ș'=>'s', ''=>'s', 'Ŝ'=>'s', 'š'=>'s', 'ś'=>'s', 'ס'=>'s', 'ş'=>'s', ''=>'s', 'ŝ'=>'s', ''=>'sch', ''=>'sch', ''=>'sh', ''=>'sh', 'ß'=>'ss', ''=>'t', 'ט'=>'t', 'ŧ'=>'t', 'ת'=>'t', 'ť'=>'t', 'ţ'=>'t', 'Ţ'=>'t', ''=>'t', 'ț'=>'t', 'Ŧ'=>'t', 'Ť'=>'t', '™'=>'tm', 'ū'=>'u', ''=>'u', 'Ũ'=>'u', 'ũ'=>'u', 'Ư'=>'u', 'ư'=>'u', 'Ū'=>'u', 'Ǔ'=>'u', 'ų'=>'u', 'Ų'=>'u', 'ŭ'=>'u', 'Ŭ'=>'u', 'Ů'=>'u', 'ů'=>'u', 'ű'=>'u', 'Ű'=>'u', 'Ǖ'=>'u', 'ǔ'=>'u', 'Ǜ'=>'u', 'ù'=>'u', 'ú'=>'u', 'û'=>'u', ''=>'u', 'ǚ'=>'u', 'ǜ'=>'u', 'Ǚ'=>'u', 'Ǘ'=>'u', 'ǖ'=>'u', 'ǘ'=>'u', 'ü'=>'ue', ''=>'v', 'ו'=>'v', ''=>'v', 'ש'=>'w', 'ŵ'=>'w', 'Ŵ'=>'w', ''=>'y', 'ŷ'=>'y', 'ý'=>'y', 'ÿ'=>'y', 'Ÿ'=>'y', 'Ŷ'=>'y', ''=>'y', 'ž'=>'z', ''=>'z', ''=>'z', 'ź'=>'z', 'ז'=>'z', 'ż'=>'z', 'ſ'=>'z', ''=>'zh', ''=>'zh' ); return strtr($s, $replace); }

Pay attention to some minor changes regarding German umlauts (ä => ae)

Edit: Included more characters based on publication with user3682119 (except copyright symbol) and comment from daker.

+38

BurninLeo May 7 '13 at 19:36

source share

In PHP 5.4, the intl extension provides a new class called Transliterator.

I believe the best way to remove diacritics is for two reasons:

The transliterator is based on the ICU, so you use the ICU library tables. OIT is an excellent project developed over the course of the year to provide comprehensive tables and functionality. No matter which table you want to write yourself, it will never be as complete as one of the ICUs.
In UTF-8, characters can be represented in different ways. For example, the character - can be saved as a single (multibyte) character or as a combination of characters ˜ (multibyte) and n . In addition to this, some Unicode characters are a homograph: they look the same with different code points. For this reason, it is also important to normalize the string.

Here is an example of code taken from my old answer :

 <?php $transliterator = Transliterator::createFromRules(':: NFD; :: [:Nonspacing Mark:] Remove; :: NFC;', Transliterator::FORWARD); $test = ['abcd', 'èe', '€', 'àòùìéëü', 'àòùìéëü', 'tiësto']; foreach($test as $e) { $normalized = $transliterator->transliterate($e); echo $e. ' --> '.$normalized."\n"; } ?>

Result:

 abcd --> abcd èe --> ee € --> € àòùìéëü --> aouieeu àòùìéëü --> aouieeu tiësto --> tiesto

The first argument for the Transliterator class does the removal of diacritics, as well as the normalization of the string.

+25

ItalyPaleAle Oct 22 '14 at 18:19

source share

So, I found this on the php.net page for the preg_replace function

 // replace accented chars $string = "Zacarías Ferreíra"; // my definition for string variable $accents = '/&([A-Za-z]{1,2})(grave|acute|circ|cedil|uml|lig);/'; $string_encoded = htmlentities($string,ENT_NOQUOTES,'UTF-8'); $string = preg_replace($accents,'$1',$string_encoded);

If you have problems with the encoding, you may come across such a “ZacarÃƒÂas FerreÃƒÂra”, just decrypt the string and use the code above

 $string = utf8_decode("ZacarÃƒÂas FerreÃƒÂra");

+10

Kasey Thomas Jul 22 '12 at 5:40

source share

This worked for me:

 <?php setlocale(LC_ALL, "en_US.utf8"); $val = iconv('UTF-8','ASCII//TRANSLIT',$val); ?>

Stergios Zg. Aug 4 '16 at 9:45

source share

 protected $_convertTable = array( '&amp;' => 'and', '@' => 'at', '©' => 'c', '®' => 'r', 'À' => 'a', 'Á' => 'a', 'Â' => 'a', 'Ä' => 'a', 'Å' => 'a', 'Æ' => 'ae','Ç' => 'c', 'È' => 'e', 'É' => 'e', 'Ë' => 'e', 'Ì' => 'i', 'Í' => 'i', 'Î' => 'i', 'Ï' => 'i', 'Ò' => 'o', 'Ó' => 'o', 'Ô' => 'o', 'Õ' => 'o', 'Ö' => 'o', 'Ø' => 'o', 'Ù' => 'u', 'Ú' => 'u', 'Û' => 'u', 'Ü' => 'u', 'Ý' => 'y', 'ß' => 'ss','à' => 'a', 'á' => 'a', 'â' => 'a', 'ä' => 'a', 'å' => 'a', 'æ' => 'ae','ç' => 'c', 'è' => 'e', 'é' => 'e', 'ê' => 'e', 'ë' => 'e', 'ì' => 'i', 'í' => 'i', 'î' => 'i', 'ï' => 'i', 'ò' => 'o', 'ó' => 'o', 'ô' => 'o', 'õ' => 'o', 'ö' => 'o', 'ø' => 'o', 'ù' => 'u', 'ú' => 'u', 'û' => 'u', 'ü' => 'u', 'ý' => 'y', 'þ' => 'p', 'ÿ' => 'y', 'Ā' => 'a', 'ā' => 'a', 'Ă' => 'a', 'ă' => 'a', 'Ą' => 'a', 'ą' => 'a', 'Ć' => 'c', 'ć' => 'c', 'Ĉ' => 'c', 'ĉ' => 'c', 'Ċ' => 'c', 'ċ' => 'c', 'Č' => 'c', 'č' => 'c', 'Ď' => 'd', 'ď' => 'd', 'Đ' => 'd', 'đ' => 'd', 'Ē' => 'e', 'ē' => 'e', 'Ĕ' => 'e', 'ĕ' => 'e', 'Ė' => 'e', 'ė' => 'e', 'Ę' => 'e', 'ę' => 'e', 'Ě' => 'e', 'ě' => 'e', 'Ĝ' => 'g', 'ĝ' => 'g', 'Ğ' => 'g', 'ğ' => 'g', 'Ġ' => 'g', 'ġ' => 'g', 'Ģ' => 'g', 'ģ' => 'g', 'Ĥ' => 'h', 'ĥ' => 'h', 'Ħ' => 'h', 'ħ' => 'h', 'Ĩ' => 'i', 'ĩ' => 'i', 'Ī' => 'i', 'ī' => 'i', 'Ĭ' => 'i', 'ĭ' => 'i', 'Į' => 'i', 'į' => 'i', 'İ' => 'i', 'ı' => 'i', 'Ĳ' => 'ij','ĳ' => 'ij','Ĵ' => 'j', 'ĵ' => 'j', 'Ķ' => 'k', 'ķ' => 'k', 'ĸ' => 'k', 'Ĺ' => 'l', 'ĺ' => 'l', 'Ļ' => 'l', 'ļ' => 'l', 'Ľ' => 'l', 'ľ' => 'l', 'Ŀ' => 'l', 'ŀ' => 'l', 'Ł' => 'l', 'ł' => 'l', 'Ń' => 'n', 'ń' => 'n', 'Ņ' => 'n', 'ņ' => 'n', 'Ň' => 'n', 'ň' => 'n', 'ŉ' => 'n', 'Ŋ' => 'n', 'ŋ' => 'n', 'Ō' => 'o', 'ō' => 'o', 'Ŏ' => 'o', 'ŏ' => 'o', 'Ő' => 'o', 'ő' => 'o', 'Œ' => 'oe','œ' => 'oe','Ŕ' => 'r', 'ŕ' => 'r', 'Ŗ' => 'r', 'ŗ' => 'r', 'Ř' => 'r', 'ř' => 'r', 'Ś' => 's', 'ś' => 's', 'Ŝ' => 's', 'ŝ' => 's', 'Ş' => 's', 'ş' => 's', 'Š' => 's', 'š' => 's', 'Ţ' => 't', 'ţ' => 't', 'Ť' => 't', 'ť' => 't', 'Ŧ' => 't', 'ŧ' => 't', 'Ũ' => 'u', 'ũ' => 'u', 'Ū' => 'u', 'ū' => 'u', 'Ŭ' => 'u', 'ŭ' => 'u', 'Ů' => 'u', 'ů' => 'u', 'Ű' => 'u', 'ű' => 'u', 'Ų' => 'u', 'ų' => 'u', 'Ŵ' => 'w', 'ŵ' => 'w', 'Ŷ' => 'y', 'ŷ' => 'y', 'Ÿ' => 'y', 'Ź' => 'z', 'ź' => 'z', 'Ż' => 'z', 'ż' => 'z', 'Ž' => 'z', 'ž' => 'z', 'ſ' => 'z', 'Ə' => 'e', 'ƒ' => 'f', 'Ơ' => 'o', 'ơ' => 'o', 'Ư' => 'u', 'ư' => 'u', 'Ǎ' => 'a', 'ǎ' => 'a', 'Ǐ' => 'i', 'ǐ' => 'i', 'Ǒ' => 'o', 'ǒ' => 'o', 'Ǔ' => 'u', 'ǔ' => 'u', 'Ǖ' => 'u', 'ǖ' => 'u', 'Ǘ' => 'u', 'ǘ' => 'u', 'Ǚ' => 'u', 'ǚ' => 'u', 'Ǜ' => 'u', 'ǜ' => 'u', 'Ǻ' => 'a', 'ǻ' => 'a', 'Ǽ' => 'ae','ǽ' => 'ae','Ǿ' => 'o', 'ǿ' => 'o', 'ə' => 'e', '' => 'jo','Є' => 'e', 'І' => 'i', 'Ї' => 'i', '' => 'a', '' => 'b', '' => 'v', '' => 'g', '' => 'd', '' => 'e', '' => 'zh','' => 'z', '' => 'i', '' => 'j', '' => 'k', '' => 'l', '' => 'm', '' => 'n', '' => 'o', '' => 'p', '' => 'r', '' => 's', '' => 't', '' => 'u', '' => 'f', '' => 'h', '' => 'c', '' => 'ch','' => 'sh','' => 'sch', '' => '-', '' => 'y', '' => '-', '' => 'je','' => 'ju','' => 'ja', '' => 'a', '' => 'b', '' => 'v', '' => 'g', '' => 'd', '' => 'e', '' => 'zh','' => 'z', '' => 'i', '' => 'j', '' => 'k', '' => 'l', '' => 'm', '' => 'n', '' => 'o', '' => 'p', '' => 'r', '' => 's', '' => 't', '' => 'u', '' => 'f', '' => 'h', '' => 'c', '' => 'ch', '' => 'sh','' => 'sch','' => '-','' => 'y', '' => '-', '' => 'je', '' => 'ju','' => 'ja','' => 'jo','є' => 'e', 'і' => 'i', 'ї' => 'i', 'Ґ' => 'g', 'ґ' => 'g', 'א' => 'a', 'ב' => 'b', 'ג' => 'g', 'ד' => 'd', 'ה' => 'h', 'ו' => 'v', 'ז' => 'z', 'ח' => 'h', 'ט' => 't', 'י' => 'i', 'ך' => 'k', 'כ' => 'k', 'ל' => 'l', 'ם' => 'm', 'מ' => 'm', 'ן' => 'n', 'נ' => 'n', 'ס' => 's', 'ע' => 'e', 'ף' => 'p', 'פ' => 'p', 'ץ' => 'C', 'צ' => 'c', 'ק' => 'q', 'ר' => 'r', 'ש' => 'w', 'ת' => 't', '™' => 'tm', );

From magento, im using it for everything

user3682119 Jul 27 '14 at 18:10

source share

Updated answer based on @BurninLeo's answer

 function replace_spec_char($subject) { $char_map = array( "" => "-", "" => "-", "" => "-", "" => "-", "" => "A", "Ă" => "A", "Ǎ" => "A", "Ą" => "A", "À" => "A", "Ã" => "A", "Á" => "A", "Æ" => "A", "Â" => "A", "Å" => "A", "Ǻ" => "A", "Ā" => "A", "א" => "A", "" => "B", "ב" => "B", "Þ" => "B", "Ĉ" => "C", "Ć" => "C", "Ç" => "C", "" => "C", "צ" => "C", "Ċ" => "C", "Č" => "C", "©" => "C", "ץ" => "C", "" => "D", "Ď" => "D", "Đ" => "D", "ד" => "D", "Ð" => "D", "È" => "E", "Ę" => "E", "É" => "E", "Ë" => "E", "Ê" => "E", "" => "E", "Ē" => "E", "Ė" => "E", "Ě" => "E", "Ĕ" => "E", "Є" => "E", "Ə" => "E", "ע" => "E", "" => "F", "Ƒ" => "F", "Ğ" => "G", "Ġ" => "G", "Ģ" => "G", "Ĝ" => "G", "" => "G", "ג" => "G", "Ґ" => "G", "ח" => "H", "Ħ" => "H", "" => "H", "Ĥ" => "H", "ה" => "H", "I" => "I", "Ï" => "I", "Î" => "I", "Í" => "I", "Ì" => "I", "Į" => "I", "Ĭ" => "I", "I" => "I", "" => "I", "Ĩ" => "I", "Ǐ" => "I", "י" => "I", "Ї" => "I", "Ī" => "I", "І" => "I", "" => "J", "Ĵ" => "J", "ĸ" => "K", "כ" => "K", "Ķ" => "K", "" => "K", "ך" => "K", "Ł" => "L", "Ŀ" => "L", "" => "L", "Ļ" => "L", "Ĺ" => "L", "Ľ" => "L", "ל" => "L", "מ" => "M", "" => "M", "ם" => "M", "Ñ" => "N", "Ń" => "N", "" => "N", "Ņ" => "N", "ן" => "N", "Ŋ" => "N", "נ" => "N", "ŉ" => "N", "Ň" => "N", "Ø" => "O", "Ó" => "O", "Ò" => "O", "Ô" => "O", "Õ" => "O", "" => "O", "Ő" => "O", "Ŏ" => "O", "Ō" => "O", "Ǿ" => "O", "Ǒ" => "O", "Ơ" => "O", "פ" => "P", "ף" => "P", "" => "P", "ק" => "Q", "Ŕ" => "R", "Ř" => "R", "Ŗ" => "R", "ר" => "R", "" => "R", "®" => "R", "Ş" => "S", "Ś" => "S", "Ș" => "S", "Š" => "S", "" => "S", "Ŝ" => "S", "ס" => "S", "" => "T", "Ț" => "T", "ט" => "T", "Ŧ" => "T", "ת" => "T", "Ť" => "T", "Ţ" => "T", "Ù" => "U", "Û" => "U", "Ú" => "U", "Ū" => "U", "" => "U", "Ũ" => "U", "Ư" => "U", "Ǔ" => "U", "Ų" => "U", "Ŭ" => "U", "Ů" => "U", "Ű" => "U", "Ǖ" => "U", "Ǜ" => "U", "Ǚ" => "U", "Ǘ" => "U", "" => "V", "ו" => "V", "Ý" => "Y", "" => "Y", "Ŷ" => "Y", "Ÿ" => "Y", "Ź" => "Z", "Ž" => "Z", "Ż" => "Z", "" => "Z", "ז" => "Z", "" => "a", "ă" => "a", "ǎ" => "a", "ą" => "a", "à" => "a", "ã" => "a", "á" => "a", "æ" => "a", "â" => "a", "å" => "a", "ǻ" => "a", "ā" => "a", "א" => "a", "" => "b", "ב" => "b", "þ" => "b", "ĉ" => "c", "ć" => "c", "ç" => "c", "" => "c", "צ" => "c", "ċ" => "c", "č" => "c", "©" => "c", "ץ" => "c", "" => "ch", "" => "ch", "" => "d", "ď" => "d", "đ" => "d", "ד" => "d", "ð" => "d", "è" => "e", "ę" => "e", "é" => "e", "ë" => "e", "ê" => "e", "" => "e", "ē" => "e", "ė" => "e", "ě" => "e", "ĕ" => "e", "є" => "e", "ə" => "e", "ע" => "e", "" => "f", "ƒ" => "f", "ğ" => "g", "ġ" => "g", "ģ" => "g", "ĝ" => "g", "" => "g", "ג" => "g", "ґ" => "g", "ח" => "h", "ħ" => "h", "" => "h", "ĥ" => "h", "ה" => "h", "i" => "i", "ï" => "i", "î" => "i", "í" => "i", "ì" => "i", "į" => "i", "ĭ" => "i", "ı" => "i", "" => "i", "ĩ" => "i", "ǐ" => "i", "י" => "i", "ї" => "i", "ī" => "i", "і" => "i", "" => "j", "" => "j", "Ĵ" => "j", "ĵ" => "j", "ĸ" => "k", "כ" => "k", "ķ" => "k", "" => "k", "ך" => "k", "ł" => "l", "ŀ" => "l", "" => "l", "ļ" => "l", "ĺ" => "l", "ľ" => "l", "ל" => "l", "מ" => "m", "" => "m", "ם" => "m", "ñ" => "n", "ń" => "n", "" => "n", "ņ" => "n", "ן" => "n", "ŋ" => "n", "נ" => "n", "ŉ" => "n", "ň" => "n", "ø" => "o", "ó" => "o", "ò" => "o", "ô" => "o", "õ" => "o", "" => "o", "ő" => "o", "ŏ" => "o", "ō" => "o", "ǿ" => "o", "ǒ" => "o", "ơ" => "o", "פ" => "p", "ף" => "p", "" => "p", "ק" => "q", "ŕ" => "r", "ř" => "r", "ŗ" => "r", "ר" => "r", "" => "r", "®" => "r", "ş" => "s", "ś" => "s", "ș" => "s", "š" => "s", "" => "s", "ŝ" => "s", "ס" => "s", "" => "t", "ț" => "t", "ט" => "t", "ŧ" => "t", "ת" => "t", "ť" => "t", "ţ" => "t", "ù" => "u", "û" => "u", "ú" => "u", "ū" => "u", "" => "u", "ũ" => "u", "ư" => "u", "ǔ" => "u", "ų" => "u", "ŭ" => "u", "ů" => "u", "ű" => "u", "ǖ" => "u", "ǜ" => "u", "ǚ" => "u", "ǘ" => "u", "" => "v", "ו" => "v", "ý" => "y", "" => "y", "ŷ" => "y", "ÿ" => "y", "ź" => "z", "ž" => "z", "ż" => "z", "" => "z", "ז" => "z", "ſ" => "z", "™" => "tm", "@" => "at", "Ä" => "ae", "Ǽ" => "ae", "ä" => "ae", "æ" => "ae", "ǽ" => "ae", "ĳ" => "ij", "Ĳ" => "ij", "" => "ja", "" => "ja", "" => "je", "" => "je", "" => "jo", "" => "jo", "" => "ju", "" => "ju", "œ" => "oe", "Œ" => "oe", "ö" => "oe", "Ö" => "oe", "" => "sch", "" => "sch", "" => "sh", "" => "sh", "ß" => "ss", "Ü" => "ue", "" => "zh", "" => "zh", ); return strtr($subject, $char_map); } $string = "Ħí ŧħə®ë, ßť å test!"; echo replace_spec_char($string);

Ħí ŧħə®ë, ßť å test! => Hi there, jusst a test!

This does not mix upper and lower case characters , except for long characters (e.g. ss, ch, sch) added by @ ® ©

Also, if you want to match regular expressions regardless of special characters:

rss => '[rŕřŘŗŖרŔ](?:[sșŜšśסşŝ][sșŜšśסşŝ]|[ß])'

Real implementation of this: https://code.launchpad.net/~jeremy-munsch/synapse-project/ascii-smart/+merge/277477

Here is a basic list that you could work with, with the replacement of regular expressions (in exalted text) or a small script that you can build from this array to suit your needs.

 "-" => "", "A" => "ĂǍĄÀÃÁÆÂÅǺĀא", "B" => "בÞ", "C" => "ĈĆÇצĊČ©ץ", "D" => "ĎĐדÐ", "E" => "ÈĘÉËÊĒĖĚĔЄƏע", "F" => "Ƒ", "G" => "ĞĠĢĜגҐ", "H" => "חĦĤה", "I" => "IÏÎÍÌĮĬIĨǏיЇĪІ", "J" => "Ĵ", "K" => "ĸכĶך", "L" => "ŁĿĻĹĽל", "M" => "מם", "N" => "ÑŃŅןŊנŉŇ", "O" => "ØÓÒÔÕŐŎŌǾǑƠ", "P" => "פף", "Q" => "ק", "R" => "ŔŘŖר®", "S" => "ŞŚȘŠŜס", "T" => "ȚטŦתŤŢ", "U" => "ÙÛÚŪŨƯǓŲŬŮŰǕǛǙǗ", "V" => "ו", "Y" => "ÝŶŸ", "Z" => "ŹŽŻז", "a" => "ăǎąàãáæâåǻāא", "b" => "בþ", "c" => "ĉćçצċč©ץ", "ch" => "", "d" => "ďđדð", "e" => "èęéëêēėěĕєəע", "f" => "ƒ", "g" => "ğġģĝגґ", "h" => "חħĥה", "i" => "iïîíìįĭıĩǐיїīі", "j" => "ĵ", "k" => "ĸכķך", "l" => "łŀļĺľל", "m" => "מם", "n" => "ñńņןŋנŉň", "o" => "øóòôõőŏōǿǒơ", "p" => "פף", "q" => "ק", "r" => "ŕřŗר®", "s" => "şśșšŝס", "t" => "țטŧתťţ", "u" => "ùûúūũưǔųŭůűǖǜǚǘ", "v" => "ו", "y" => "ýŷÿ", "z" => "źžżזſ", "tm" => "™", "at" => "@", "ae" => "ÄǼäæǽ", "ch" => "", "ij" => "ĳĲ", "j" => "Ĵĵ", "ja" => "", "je" => "", "jo" => "", "ju" => "", "oe" => "œŒöÖ", "sch" => "", "sh" => "", "ss" => "ß", "tm" => "™", "ue" => "Ü", "zh" => ""

Kwaadpepper Nov 22 '15 at 15:05

source share

Disclaimer: I no longer support this answer (at that time I was blind). But thanks for up-vote = P

You can take this as a basis. From WordPress used to generate good URLs (the entry point is the slugify () function):

 /** * Converts all accent characters to ASCII characters. * * If there are no accent characters, then the string given is just returned. * * @param string $string Text that might have accent characters * @return string Filtered string with replaced "nice" characters. */ function remove_accents($string) { if (!preg_match('/[\x80-\xff]/', $string)) return $string; if (seems_utf8($string)) { $chars = array( // Decompositions for Latin-1 Supplement chr(195).chr(128) => 'A', chr(195).chr(129) => 'A', chr(195).chr(130) => 'A', chr(195).chr(131) => 'A', chr(195).chr(132) => 'A', chr(195).chr(133) => 'A', chr(195).chr(135) => 'C', chr(195).chr(136) => 'E', chr(195).chr(137) => 'E', chr(195).chr(138) => 'E', chr(195).chr(139) => 'E', chr(195).chr(140) => 'I', chr(195).chr(141) => 'I', chr(195).chr(142) => 'I', chr(195).chr(143) => 'I', chr(195).chr(145) => 'N', chr(195).chr(146) => 'O', chr(195).chr(147) => 'O', chr(195).chr(148) => 'O', chr(195).chr(149) => 'O', chr(195).chr(150) => 'O', chr(195).chr(153) => 'U', chr(195).chr(154) => 'U', chr(195).chr(155) => 'U', chr(195).chr(156) => 'U', chr(195).chr(157) => 'Y', chr(195).chr(159) => 's', chr(195).chr(160) => 'a', chr(195).chr(161) => 'a', chr(195).chr(162) => 'a', chr(195).chr(163) => 'a', chr(195).chr(164) => 'a', chr(195).chr(165) => 'a', chr(195).chr(167) => 'c', chr(195).chr(168) => 'e', chr(195).chr(169) => 'e', chr(195).chr(170) => 'e', chr(195).chr(171) => 'e', chr(195).chr(172) => 'i', chr(195).chr(173) => 'i', chr(195).chr(174) => 'i', chr(195).chr(175) => 'i', chr(195).chr(177) => 'n', chr(195).chr(178) => 'o', chr(195).chr(179) => 'o', chr(195).chr(180) => 'o', chr(195).chr(181) => 'o', chr(195).chr(182) => 'o', chr(195).chr(182) => 'o', chr(195).chr(185) => 'u', chr(195).chr(186) => 'u', chr(195).chr(187) => 'u', chr(195).chr(188) => 'u', chr(195).chr(189) => 'y', chr(195).chr(191) => 'y', // Decompositions for Latin Extended-A chr(196).chr(128) => 'A', chr(196).chr(129) => 'a', chr(196).chr(130) => 'A', chr(196).chr(131) => 'a', chr(196).chr(132) => 'A', chr(196).chr(133) => 'a', chr(196).chr(134) => 'C', chr(196).chr(135) => 'c', chr(196).chr(136) => 'C', chr(196).chr(137) => 'c', chr(196).chr(138) => 'C', chr(196).chr(139) => 'c', chr(196).chr(140) => 'C', chr(196).chr(141) => 'c', chr(196).chr(142) => 'D', chr(196).chr(143) => 'd', chr(196).chr(144) => 'D', chr(196).chr(145) => 'd', chr(196).chr(146) => 'E', chr(196).chr(147) => 'e', chr(196).chr(148) => 'E', chr(196).chr(149) => 'e', chr(196).chr(150) => 'E', chr(196).chr(151) => 'e', chr(196).chr(152) => 'E', chr(196).chr(153) => 'e', chr(196).chr(154) => 'E', chr(196).chr(155) => 'e', chr(196).chr(156) => 'G', chr(196).chr(157) => 'g', chr(196).chr(158) => 'G', chr(196).chr(159) => 'g', chr(196).chr(160) => 'G', chr(196).chr(161) => 'g', chr(196).chr(162) => 'G', chr(196).chr(163) => 'g', chr(196).chr(164) => 'H', chr(196).chr(165) => 'h', chr(196).chr(166) => 'H', chr(196).chr(167) => 'h', chr(196).chr(168) => 'I', chr(196).chr(169) => 'i', chr(196).chr(170) => 'I', chr(196).chr(171) => 'i', chr(196).chr(172) => 'I', chr(196).chr(173) => 'i', chr(196).chr(174) => 'I', chr(196).chr(175) => 'i', chr(196).chr(176) => 'I', chr(196).chr(177) => 'i', chr(196).chr(178) => 'IJ',chr(196).chr(179) => 'ij', chr(196).chr(180) => 'J', chr(196).chr(181) => 'j', chr(196).chr(182) => 'K', chr(196).chr(183) => 'k', chr(196).chr(184) => 'k', chr(196).chr(185) => 'L', chr(196).chr(186) => 'l', chr(196).chr(187) => 'L', chr(196).chr(188) => 'l', chr(196).chr(189) => 'L', chr(196).chr(190) => 'l', chr(196).chr(191) => 'L', chr(197).chr(128) => 'l', chr(197).chr(129) => 'L', chr(197).chr(130) => 'l', chr(197).chr(131) => 'N', chr(197).chr(132) => 'n', chr(197).chr(133) => 'N', chr(197).chr(134) => 'n', chr(197).chr(135) => 'N', chr(197).chr(136) => 'n', chr(197).chr(137) => 'N', chr(197).chr(138) => 'n', chr(197).chr(139) => 'N', chr(197).chr(140) => 'O', chr(197).chr(141) => 'o', chr(197).chr(142) => 'O', chr(197).chr(143) => 'o', chr(197).chr(144) => 'O', chr(197).chr(145) => 'o', chr(197).chr(146) => 'OE',chr(197).chr(147) => 'oe', chr(197).chr(148) => 'R',chr(197).chr(149) => 'r', chr(197).chr(150) => 'R',chr(197).chr(151) => 'r', chr(197).chr(152) => 'R',chr(197).chr(153) => 'r', chr(197).chr(154) => 'S',chr(197).chr(155) => 's', chr(197).chr(156) => 'S',chr(197).chr(157) => 's', chr(197).chr(158) => 'S',chr(197).chr(159) => 's', chr(197).chr(160) => 'S', chr(197).chr(161) => 's', chr(197).chr(162) => 'T', chr(197).chr(163) => 't', chr(197).chr(164) => 'T', chr(197).chr(165) => 't', chr(197).chr(166) => 'T', chr(197).chr(167) => 't', chr(197).chr(168) => 'U', chr(197).chr(169) => 'u', chr(197).chr(170) => 'U', chr(197).chr(171) => 'u', chr(197).chr(172) => 'U', chr(197).chr(173) => 'u', chr(197).chr(174) => 'U', chr(197).chr(175) => 'u', chr(197).chr(176) => 'U', chr(197).chr(177) => 'u', chr(197).chr(178) => 'U', chr(197).chr(179) => 'u', chr(197).chr(180) => 'W', chr(197).chr(181) => 'w', chr(197).chr(182) => 'Y', chr(197).chr(183) => 'y', chr(197).chr(184) => 'Y', chr(197).chr(185) => 'Z', chr(197).chr(186) => 'z', chr(197).chr(187) => 'Z', chr(197).chr(188) => 'z', chr(197).chr(189) => 'Z', chr(197).chr(190) => 'z', chr(197).chr(191) => 's', // Euro Sign chr(226).chr(130).chr(172) => 'E', // GBP (Pound) Sign chr(194).chr(163) => ''); $string = strtr($string, $chars); } else { // Assume ISO-8859-1 if not UTF-8 $chars['in'] = chr(128).chr(131).chr(138).chr(142).chr(154).chr(158) .chr(159).chr(162).chr(165).chr(181).chr(192).chr(193).chr(194) .chr(195).chr(196).chr(197).chr(199).chr(200).chr(201).chr(202) .chr(203).chr(204).chr(205).chr(206).chr(207).chr(209).chr(210) .chr(211).chr(212).chr(213).chr(214).chr(216).chr(217).chr(218) .chr(219).chr(220).chr(221).chr(224).chr(225).chr(226).chr(227) .chr(228).chr(229).chr(231).chr(232).chr(233).chr(234).chr(235) .chr(236).chr(237).chr(238).chr(239).chr(241).chr(242).chr(243) .chr(244).chr(245).chr(246).chr(248).chr(249).chr(250).chr(251) .chr(252).chr(253).chr(255); $chars['out'] = "EfSZszYcYuAAAAAACEEEEIIIINOOOOOOUUUUYaaaaaaceeeeiiiinoooooouuuuyy"; $string = strtr($string, $chars['in'], $chars['out']); $double_chars['in'] = array(chr(140), chr(156), chr(198), chr(208), chr(222), chr(223), chr(230), chr(240), chr(254)); $double_chars['out'] = array('OE', 'oe', 'AE', 'DH', 'TH', 'ss', 'ae', 'dh', 'th'); $string = str_replace($double_chars['in'], $double_chars['out'], $string); } return $string; } /** * Checks to see if a string is utf8 encoded. * * @author bmorel at ssi dot fr * * @param string $Str The string to be checked * @return bool True if $Str fits a UTF-8 model, false otherwise. */ function seems_utf8($Str) { # by bmorel at ssi dot fr $length = strlen($Str); for ($i = 0; $i < $length; $i++) { if (ord($Str[$i]) < 0x80) continue; # 0bbbbbbb elseif ((ord($Str[$i]) & 0xE0) == 0xC0) $n = 1; # 110bbbbb elseif ((ord($Str[$i]) & 0xF0) == 0xE0) $n = 2; # 1110bbbb elseif ((ord($Str[$i]) & 0xF8) == 0xF0) $n = 3; # 11110bbb elseif ((ord($Str[$i]) & 0xFC) == 0xF8) $n = 4; # 111110bb elseif ((ord($Str[$i]) & 0xFE) == 0xFC) $n = 5; # 1111110b else return false; # Does not match any model for ($j = 0; $j < $n; $j++) { # n bytes matching 10bbbbbb follow ? if ((++$i == $length) || ((ord($Str[$i]) & 0xC0) != 0x80)) return false; } } return true; } function utf8_uri_encode($utf8_string, $length = 0) { $unicode = ''; $values = array(); $num_octets = 1; $unicode_length = 0; $string_length = strlen($utf8_string); for ($i = 0; $i < $string_length; $i++) { $value = ord($utf8_string[$i]); if ($value < 128) { if ($length && ($unicode_length >= $length)) break; $unicode .= chr($value); $unicode_length++; } else { if (count($values) == 0) $num_octets = ($value < 224) ? 2 : 3; $values[] = $value; if ($length && ($unicode_length + ($num_octets * 3)) > $length) break; if (count( $values ) == $num_octets) { if ($num_octets == 3) { $unicode .= '%' . dechex($values[0]) . '%' . dechex($values[1]) . '%' . dechex($values[2]); $unicode_length += 9; } else { $unicode .= '%' . dechex($values[0]) . '%' . dechex($values[1]); $unicode_length += 6; } $values = array(); $num_octets = 1; } } } return $unicode; } /** * Sanitizes title, replacing whitespace with dashes. * * Limits the output to alphanumeric characters, underscore (_) and dash (-). * Whitespace becomes a dash. * * @param string $title The title to be sanitized. * @return string The sanitized title. */ function slugify($title) { $title = strip_tags($title); // Preserve escaped octets. $title = preg_replace('|%([a-fA-F0-9][a-fA-F0-9])|', '---$1---', $title); // Remove percent signs that are not part of an octet. $title = str_replace('%', '', $title); // Restore octets. $title = preg_replace('|---([a-fA-F0-9][a-fA-F0-9])---|', '%$1', $title); $title = remove_accents($title); if (seems_utf8($title)) { if (function_exists('mb_strtolower')) { $title = mb_strtolower($title, 'UTF-8'); } $title = utf8_uri_encode($title, 200); } $title = strtolower($title); $title = preg_replace('/&.+?;/', '', $title); // kill entities $title = preg_replace('/[^%a-z0-9 _-]/', '', $title); $title = preg_replace('/\s+/', '-', $title); $title = preg_replace('|-+|', '-', $title); $title = trim($title, '-'); return $title; }

Keyne Viana 30 . '10 13:19

source share

http://php.net/manual/en/book.intl.php , :

 $string = "Éric Cantona"; $transliterator = Transliterator::createFromRules(':: NFD; :: [:Nonspacing Mark:] Remove; :: Lower(); :: NFC;', Transliterator::FORWARD); echo $normalized = $transliterator->transliterate($string);

EDIT

php Ubuntu:

 apt-get install php-intl

, composer, ext-intl , .

gabo 03 . '16 12:54

source share

, , 2 . , :

 $patterns[0] = '/[áâàåä]/ui'; $patterns[1] = '/[ðéêèë]/ui'; $patterns[2] = '/[íîìï]/ui'; $patterns[3] = '/[óôòøõö]/ui'; $patterns[4] = '/[úûùü]/ui'; $patterns[5] = '/æ/ui'; $patterns[6] = '/ç/ui'; $patterns[7] = '/ß/ui'; $replacements[0] = 'a'; $replacements[1] = 'e'; $replacements[2] = 'i'; $replacements[3] = 'o'; $replacements[4] = 'u'; $replacements[5] = 'ae'; $replacements[6] = 'c'; $replacements[7] = 'ss';

, , - . regualr /[someCoolRegex]/ui , u , unicode, i , , ansewer , , strtr.

, - .

Colorman 24 . '17 15:20

source share

strtolower iso-8859-1. mb_strtolower .

, , iconv:

 iconv("UTF-8", "ISO-8859-1//TRANSLIT", $text);

Edit:

, . , iso-8859-1, . . For example:.

 '/(ð|é|ê|è|ë)/'

not

 '/[ð|é|ê|è|ë]/'

troelskn 30 . '10 13:09

source share

, ...

, :

-, . , php script, .

-, iconv , , Fatal Exception.

, , :

 function replaceAccent($string, $replacement = '_') { $alnumPattern = '/^[a-zA-Z0-9 ]+$/'; if (preg_match($alnumPattern, $string)) { return $string; } $ret = array_map( function ($chr) use ($alnumPattern, $replacement) { if (preg_match($alnumPattern, $chr)) { return $chr; } else { $chr = @iconv('ISO-8859-1', 'ASCII//TRANSLIT', $chr); if (strlen($chr) == 1) { return $chr; } elseif (strlen($chr) > 1) { $ret = ''; foreach (str_split($chr) as $char2) { if (preg_match($alnumPattern, $char2)) { $ret .= $char2; } } return $ret; } else { // replace whatever iconv fail to convert by something else return $replacement; } } }, str_split($string) ); return implode($ret); }

frenus 28 . '14 21:43

source share

( ) , WordPress . , , WordPress.

  function mbstring_binary_safe_encoding($reset = false) { static $encodings = array(); static $overloaded = null; if (is_null($overloaded)) { $overloaded = function_exists('mb_internal_encoding') && (ini_get('mbstring.func_overload') & 2); } if (false === $overloaded) { return; } if (!$reset) { $encoding = mb_internal_encoding(); array_push($encodings, $encoding); mb_internal_encoding('ISO-8859-1'); } if ($reset && $encodings) { $encoding = array_pop($encodings); mb_internal_encoding($encoding); } } function seems_utf8($str) { mbstring_binary_safe_encoding(); $length = strlen($str); mbstring_binary_safe_encoding(true); for ($i = 0; $i < $length; $i++) { $c = ord($str[$i]); if ($c < 0x80) { $n = 0; } // 0bbbbbbb elseif (($c & 0xE0) == 0xC0) { $n = 1; } // 110bbbbb elseif (($c & 0xF0) == 0xE0) { $n = 2; } // 1110bbbb elseif (($c & 0xF8) == 0xF0) { $n = 3; } // 11110bbb elseif (($c & 0xFC) == 0xF8) { $n = 4; } // 111110bb elseif (($c & 0xFE) == 0xFC) { $n = 5; } // 1111110b else { return false; } // Does not match any model for ($j = 0; $j < $n; $j++) { // n bytes matching 10bbbbbb follow ? if ((++$i == $length) || ((ord($str[$i]) & 0xC0) != 0x80)) { return false; } } } return true; } function remove_accents($string) { if (!preg_match('/[\x80-\xff]/', $string)) { return $string; } if (seems_utf8($string)) { $chars = array( // Decompositions for Latin-1 Supplement 'ª' => 'a', 'º' => 'o', 'À' => 'A', 'Á' => 'A', 'Â' => 'A', 'Ã' => 'A', 'Ä' => 'A', 'Å' => 'A', 'Æ' => 'AE', 'Ç' => 'C', 'È' => 'E', 'É' => 'E', 'Ê' => 'E', 'Ë' => 'E', 'Ì' => 'I', 'Í' => 'I', 'Î' => 'I', 'Ï' => 'I', 'Ð' => 'D', 'Ñ' => 'N', 'Ò' => 'O', 'Ó' => 'O', 'Ô' => 'O', 'Õ' => 'O', 'Ö' => 'O', 'Ù' => 'U', 'Ú' => 'U', 'Û' => 'U', 'Ü' => 'U', 'Ý' => 'Y', 'Þ' => 'TH', 'ß' => 's', 'à' => 'a', 'á' => 'a', 'â' => 'a', 'ã' => 'a', 'ä' => 'a', 'å' => 'a', 'æ' => 'ae', 'ç' => 'c', 'è' => 'e', 'é' => 'e', 'ê' => 'e', 'ë' => 'e', 'ì' => 'i', 'í' => 'i', 'î' => 'i', 'ï' => 'i', 'ð' => 'd', 'ñ' => 'n', 'ò' => 'o', 'ó' => 'o', 'ô' => 'o', 'õ' => 'o', 'ö' => 'o', 'ø' => 'o', 'ù' => 'u', 'ú' => 'u', 'û' => 'u', 'ü' => 'u', 'ý' => 'y', 'þ' => 'th', 'ÿ' => 'y', 'Ø' => 'O', // Decompositions for Latin Extended-A 'Ā' => 'A', 'ā' => 'a', 'Ă' => 'A', 'ă' => 'a', 'Ą' => 'A', 'ą' => 'a', 'Ć' => 'C', 'ć' => 'c', 'Ĉ' => 'C', 'ĉ' => 'c', 'Ċ' => 'C', 'ċ' => 'c', 'Č' => 'C', 'č' => 'c', 'Ď' => 'D', 'ď' => 'd', 'Đ' => 'D', 'đ' => 'd', 'Ē' => 'E', 'ē' => 'e', 'Ĕ' => 'E', 'ĕ' => 'e', 'Ė' => 'E', 'ė' => 'e', 'Ę' => 'E', 'ę' => 'e', 'Ě' => 'E', 'ě' => 'e', 'Ĝ' => 'G', 'ĝ' => 'g', 'Ğ' => 'G', 'ğ' => 'g', 'Ġ' => 'G', 'ġ' => 'g', 'Ģ' => 'G', 'ģ' => 'g', 'Ĥ' => 'H', 'ĥ' => 'h', 'Ħ' => 'H', 'ħ' => 'h', 'Ĩ' => 'I', 'ĩ' => 'i', 'Ī' => 'I', 'ī' => 'i', 'Ĭ' => 'I', 'ĭ' => 'i', 'Į' => 'I', 'į' => 'i', 'İ' => 'I', 'ı' => 'i', 'Ĳ' => 'IJ', 'ĳ' => 'ij', 'Ĵ' => 'J', 'ĵ' => 'j', 'Ķ' => 'K', 'ķ' => 'k', 'ĸ' => 'k', 'Ĺ' => 'L', 'ĺ' => 'l', 'Ļ' => 'L', 'ļ' => 'l', 'Ľ' => 'L', 'ľ' => 'l', 'Ŀ' => 'L', 'ŀ' => 'l', 'Ł' => 'L', 'ł' => 'l', 'Ń' => 'N', 'ń' => 'n', 'Ņ' => 'N', 'ņ' => 'n', 'Ň' => 'N', 'ň' => 'n', 'ŉ' => 'n', 'Ŋ' => 'N', 'ŋ' => 'n', 'Ō' => 'O', 'ō' => 'o', 'Ŏ' => 'O', 'ŏ' => 'o', 'Ő' => 'O', 'ő' => 'o', 'Œ' => 'OE', 'œ' => 'oe', 'Ŕ' => 'R', 'ŕ' => 'r', 'Ŗ' => 'R', 'ŗ' => 'r', 'Ř' => 'R', 'ř' => 'r', 'Ś' => 'S', 'ś' => 's', 'Ŝ' => 'S', 'ŝ' => 's', 'Ş' => 'S', 'ş' => 's', 'Š' => 'S', 'š' => 's', 'Ţ' => 'T', 'ţ' => 't', 'Ť' => 'T', 'ť' => 't', 'Ŧ' => 'T', 'ŧ' => 't', 'Ũ' => 'U', 'ũ' => 'u', 'Ū' => 'U', 'ū' => 'u', 'Ŭ' => 'U', 'ŭ' => 'u', 'Ů' => 'U', 'ů' => 'u', 'Ű' => 'U', 'ű' => 'u', 'Ų' => 'U', 'ų' => 'u', 'Ŵ' => 'W', 'ŵ' => 'w', 'Ŷ' => 'Y', 'ŷ' => 'y', 'Ÿ' => 'Y', 'Ź' => 'Z', 'ź' => 'z', 'Ż' => 'Z', 'ż' => 'z', 'Ž' => 'Z', 'ž' => 'z', 'ſ' => 's', // Decompositions for Latin Extended-B 'Ș' => 'S', 'ș' => 's', 'Ț' => 'T', 'ț' => 't', // Euro Sign '€' => 'E', // GBP (Pound) Sign '£' => '', // Vowels with diacritic (Vietnamese) // unmarked 'Ơ' => 'O', 'ơ' => 'o', 'Ư' => 'U', 'ư' => 'u', // grave accent 'Ầ' => 'A', 'ầ' => 'a', 'Ằ' => 'A', 'ằ' => 'a', 'Ề' => 'E', 'ề' => 'e', 'Ồ' => 'O', 'ồ' => 'o', 'Ờ' => 'O', 'ờ' => 'o', 'Ừ' => 'U', 'ừ' => 'u', 'Ỳ' => 'Y', 'ỳ' => 'y', // hook 'Ả' => 'A', 'ả' => 'a', 'Ẩ' => 'A', 'ẩ' => 'a', 'Ẳ' => 'A', 'ẳ' => 'a', 'Ẻ' => 'E', 'ẻ' => 'e', 'Ể' => 'E', 'ể' => 'e', 'Ỉ' => 'I', 'ỉ' => 'i', 'Ỏ' => 'O', 'ỏ' => 'o', 'Ổ' => 'O', 'ổ' => 'o', 'Ở' => 'O', 'ở' => 'o', 'Ủ' => 'U', 'ủ' => 'u', 'Ử' => 'U', 'ử' => 'u', 'Ỷ' => 'Y', 'ỷ' => 'y', // tilde 'Ẫ' => 'A', 'ẫ' => 'a', 'Ẵ' => 'A', 'ẵ' => 'a', 'Ẽ' => 'E', 'ẽ' => 'e', 'Ễ' => 'E', 'ễ' => 'e', 'Ỗ' => 'O', 'ỗ' => 'o', 'Ỡ' => 'O', 'ỡ' => 'o', 'Ữ' => 'U', 'ữ' => 'u', 'Ỹ' => 'Y', 'ỹ' => 'y', // acute accent 'Ấ' => 'A', 'ấ' => 'a', 'Ắ' => 'A', 'ắ' => 'a', 'Ế' => 'E', 'ế' => 'e', 'Ố' => 'O', 'ố' => 'o', 'Ớ' => 'O', 'ớ' => 'o', 'Ứ' => 'U', 'ứ' => 'u', // dot below 'Ạ' => 'A', 'ạ' => 'a', 'Ậ' => 'A', 'ậ' => 'a', 'Ặ' => 'A', 'ặ' => 'a', 'Ẹ' => 'E', 'ẹ' => 'e', 'Ệ' => 'E', 'ệ' => 'e', 'Ị' => 'I', 'ị' => 'i', 'Ọ' => 'O', 'ọ' => 'o', 'Ộ' => 'O', 'ộ' => 'o', 'Ợ' => 'O', 'ợ' => 'o', 'Ụ' => 'U', 'ụ' => 'u', 'Ự' => 'U', 'ự' => 'u', 'Ỵ' => 'Y', 'ỵ' => 'y', // Vowels with diacritic (Chinese, Hanyu Pinyin) 'ɑ' => 'a', // macron 'Ǖ' => 'U', 'ǖ' => 'u', // acute accent 'Ǘ' => 'U', 'ǘ' => 'u', // caron 'Ǎ' => 'A', 'ǎ' => 'a', 'Ǐ' => 'I', 'ǐ' => 'i', 'Ǒ' => 'O', 'ǒ' => 'o', 'Ǔ' => 'U', 'ǔ' => 'u', 'Ǚ' => 'U', 'ǚ' => 'u', // grave accent 'Ǜ' => 'U', 'ǜ' => 'u', ); $string = strtr($string, $chars); } else { $chars = array(); // Assume ISO-8859-1 if not UTF-8 $chars['in'] = "\x80\x83\x8a\x8e\x9a\x9e" . "\x9f\xa2\xa5\xb5\xc0\xc1\xc2" . "\xc3\xc4\xc5\xc7\xc8\xc9\xca" . "\xcb\xcc\xcd\xce\xcf\xd1\xd2" . "\xd3\xd4\xd5\xd6\xd8\xd9\xda" . "\xdb\xdc\xdd\xe0\xe1\xe2\xe3" . "\xe4\xe5\xe7\xe8\xe9\xea\xeb" . "\xec\xed\xee\xef\xf1\xf2\xf3" . "\xf4\xf5\xf6\xf8\xf9\xfa\xfb" . "\xfc\xfd\xff"; $chars['out'] = "EfSZszYcYuAAAAAACEEEEIIIINOOOOOOUUUUYaaaaaaceeeeiiiinoooooouuuuyy"; $string = strtr($string, $chars['in'], $chars['out']); $double_chars = array(); $double_chars['in'] = array("\x8c", "\x9c", "\xc6", "\xd0", "\xde", "\xdf", "\xe6", "\xf0", "\xfe"); $double_chars['out'] = array('OE', 'oe', 'AE', 'DH', 'TH', 'ss', 'ae', 'dh', 'th'); $string = str_replace($double_chars['in'], $double_chars['out'], $string); } return $string; }

Selay 07 . '19 23:57

source share

, Lizard , -, , , , . Thank you in advance.

 $unwanted_array = array( 'Š'=>'S', 'š'=>'s', 'Ž'=>'Z', 'ž'=>'z', 'À'=>'A', 'Á'=>'A', 'Â'=>'A', 'Ã'=>'A', 'Ä'=>'A', 'Å'=>'A', 'Æ'=>'A', 'Ç'=>'C', 'È'=>'E', 'É'=>'E', 'Ê'=>'E', 'Ë'=>'E', 'Ì'=>'I', 'Í'=>'I', 'Î'=>'I', 'Ï'=>'I', 'Ñ'=>'N', 'Ò'=>'O', 'Ó'=>'O', 'Ô'=>'O', 'Õ'=>'O', 'Ö'=>'O', 'Ø'=>'O', 'Ù'=>'U', 'Ú'=>'U', 'Û'=>'U', 'Ü'=>'U', 'Ý'=>'Y', 'Þ'=>'B', 'ß'=>'Ss', 'à'=>'a', 'á'=>'a', 'â'=>'a', 'ã'=>'a', 'ä'=>'a', 'å'=>'a', 'æ'=>'a', 'ç'=>'c', 'è'=>'e', 'é'=>'e', 'ê'=>'e', 'ë'=>'e', 'ì'=>'i', 'í'=>'i', 'î'=>'i', 'ï'=>'i', 'ð'=>'o', 'ñ'=>'n', 'ò'=>'o', 'ó'=>'o', 'ô'=>'o', 'õ'=>'o', 'ö'=>'o', 'ø'=>'o', 'ù'=>'u', 'ú'=>'u', 'û'=>'u', 'ý'=>'y', 'þ'=>'b', 'ÿ'=>'y', '&#225;'=>'a', '&#233;'=>'e', '&#237;'=>'i', '&#243;'=>'o', '&#250;'=>'u', '&#193;'=>'A', '&#201;'=>'E', '&#205;'=>'I', '&#211;'=>'O', '&#218;'=>'U', '&#209;'=>'N', '&#241;'=>'n' ); $newtag = strtr( $newtag, $unwanted_array );

Luis H Cabrejo 17 . '19 0:41

source share

All Articles

Replacing accented php characters

EDIT

More articles: