How to encode Japanese into something like "日本 に 行 っ て"? (UTF-8)

As the name says. I can't seem to find an answer with any of the following questions: php headers, css headers, html headers, mysql encodings (to utf8_general_ci) or

<form acceptcharset="utf-8"... >

In fact, it really is.

I mainly live through this process:

  • Enter Japanese characters, follow through the form
  • Saving a form in MySQL DB
  • PHP retrieves data from a MySQL database and formats it for a web page.

In step 3, I check the code and see that it literally displays Japanese characters. Since this does this, I assume that it causes the PHP errors that I get (functions that work fine for English characters don't work that good for Japanese text).

So, I want to encode in UTF-8 format, but I'm not sure how to do this?

Edit: here is the PHP function that I use in Japanese text

function short_text_jap($text, $length=300) { 
    if (strlen($text) > $length) { 
            $pattern = '/^(.{0,'.$length.'}\\b).*$/s'; 
            $text = preg_replace($pattern, "$1...", $text); 
    } 
    return $text;

But instead of a reduced amount of text, it returns all this.

+3
source share
2 answers

There seems to be a bit of confusion as to what UTF8 : indicating the target as receiving the "UTF8 version" of literal Japanese characters.

Things like &#26085;are ASCII-compliant HTML objects (mostly Unicode links) already provided in some encoding, while UTF8 is a multi-byte encoding scheme that defines how characters are stored at the byte level.

I suggest relying on the letter form, as this makes it easier to manage the entire mess using international alphabets.

UTF8 : , HTML, PHP . PHP Multibyte String, :

mb_internal_encoding("UTF-8");

function short_text_jap($text, $length=300) {
    return mb_strlen($text) > $length ? mb_substr($text, 0, $length) : $text;
}

echo short_text_jap('日本語', 2); // outputs 日本
+1

, UTF-8 ASCII -ASCII , PHP :

mb_substitute_character('entity');
$str = '日本語';  // UTF-8 encoded string
echo mb_convert_encoding($str, 'US-ASCII', 'UTF-8');

:

&#x65E5;&#x672C;&#x8A9E;
+4

Source: https://habr.com/ru/post/1791854/


All Articles