Problem with PHP utf8

Question

Problem with PHP utf8

I have some problems comparing an array with Norwegian characters with utf8 character.

All characters except special Norwegian characters (æ, ø, å) work fine.

function isNorwegianChar($Char)
{
    $aNorwegianChars = array('a', 'A', 'b', 'B', 'c', 'C', 'd', 'D', 'e', 'E', 'f', 'F', 'g', 'G', 'h', 'H', 'i', 'I', 'j', 'J', 'k', 'K', 'l', 'L', 'm', 'M', 'n', 'N', 'o', 'O', 'p', 'P', 'q', 'Q', 'r', 'R', 's', 'S', 't', 'T', 'u', 'U', 'v', 'V', 'w', 'W', 'x', 'X', 'y', 'Y', 'z', 'Z', 'æ', 'Æ', 'ø', 'Ø', 'å', 'Å', '=', '(', ')', ' ', '-');
    $iArrayLength = count($aNorwegianChars);

    for($iCount = 0; $iCount < $iArrayLength; $iCount++)
    {
        if($aNorwegianChars[$iCount] == $Char)
        {
            return true;
        }
    }

    return false;

}

If anyone has an idea of what I can do, please let me know.

Update:

The reason for this is because I am trying to parse a text file containing strings with Norwegian and Chinese words, such as a dictionary. I want to split a string into strings, one of which contains the Norwegian word, and the other in Chinese. This will later be inserted into the database. Example lines:

impulsiv 形衝動的

imøtegå 動反對, 反駁

imøtekomme 動符合

alkoholmisbruk (er) 名濫用酒精 (名濫用酒精的人)

alkoholpåvirket 形受酒精影響的

breath test 名呼吸性酒精測試

alkymi (st) 名煉金術 (名煉金術士)

all, alt, alle, 形全部, 所有

, , - , , . isNorwegianChar , char, .

, æ, ø å , , .

:

   //Open file.
$rFile = fopen("norsk-kinesisk.txt", "r");

// Loop through the file.
$Count = 0;
while(!feof($rFile))
{
    if(40== $Count)
    {
        break;
    }

    $sLine = fgets($rFile);

    if(0 == $Count)
    {
        $sLine = mb_substr($sLine, 3);
    }

    $iLineLength        = strlen($sLine);
    $bChineseHasStarted = false;
    $sNorwegianWord     = '';
    $sChineseWord       = '';
    for($iCount2 = 0; $iCount2 < $iLineLength; $iCount2++)
    {
        $char = mb_substr($sLine, $iCount2, 1);

        if(($bChineseHasStarted === false) && (false == isNorwegianChar($char)))
        {
            $bChineseHasStarted = true;
        }

        if(false === $bChineseHasStarted)
        {
            $sNorwegianWord .= $char;
        }
        else
        {
            $sChineseWord .= $char;
        }

        //echo $char;
    }

    $sNorwegianWord = trim($sNorwegianWord);
    $sChineseWord = trim($sChineseWord);

    $Count++;
}

fclose($rFile);

+3

php utf-8

Christoffer 03 . '08 12:41

7

utf8.

0

Mote 03 . '08 13:01

Gilles · Answer 1 · 2008-10-03T12:45:54+0000

, UTF-8 , , , , . PHP :

http://fr.php.net/array_search

, , . , PHP, , UTF-8!

UPDATE:

, . , , PHP UTF-8 ini_set.

<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01//EN">
<html>
<head><title>norvegian utf-8 test</title>
<meta http-equiv="Content-type" value="text/html; charset=UTF-8" />
</head>

<body>

<?php

function isSpecial($char) {
    $special_chars = array("æ", "ø", "å", "か");
    return (array_search($char, $special_chars) !== false);
}

if (isset($_REQUEST["char"])) {
    echo $_REQUEST["char"].(isSpecial($_REQUEST["char"])?" (true)":" (false)");
}


?>

<form  method="POST" accept-charset="UTF-8">
<input type="text" name="char">
<input type="submit" value="submit">
</form>


</body>
</html>

Joeri Sebrechts · Answer 2 · 2008-10-03T12:54:14+0000

PHP script ANSI, UTF-8, , , UTF-8. PHP , , , .

, , PHP script , , , iconv mbstring .

, , : http://www.joelonsoftware.com/articles/Unicode.html

:. , , - , , , - , . , , UTF-8 ( ) . mbstring , .

Christoffer · Answer 3 · 2008-10-03T15:43:05+0000

, , . , , .

, , , , . , , mb_strpos . , isNorwegianChar. :

function isNorwegianChar($Char)
{
    $sNorwegianChars = "'aAbBcCdDeEfFgGhHiIjJkKlLmMnNoOpPqQrRsStTuUvVwWxXyYzZæÆøØåÅ=() -,";

    if(mb_strpos($sNorwegianChars, $Char))
    {
        return true;
    }
    else
    {
        return false;
    }
}

!

Mote · Answer 4 · 2008-10-03T12:50:57+0000

, mbstring

Benny Wong · Answer 5 · 2008-10-03T12:56:10+0000

, , mbstring (http://www.php.net/manual/en/ref.mbstring.php), -.

user22960 · Answer 6 · 2008-10-05T17:57:15+0000

, () , ( , "¶" ), , ?

impulsiv¶ 形衝動的

mb-split mb-substr mb-strpos.

, !

Unfortunately, PCRE in PHP does not allow us to use \ p with script names .

(find "InMusicalSymbols" in regexp.reference , under § "Unicode Character Properties" to understand what I mean)

Problem with PHP utf8

More articles: