"set names" and mysqli_set_charset - except that they affect mysqli_escape_string, are they identical?

It seems common knowledge to use mysql_set_charset / mysqli::set_charset instead of directly querying MySQL set names .

The often-mentioned reason is that set names unsafe because the encoding used for mysql_real_escape_string / mysqli::real_escape_string will only be set by calling mysql_set_charset / mysqli::set_charset . (Another reason we are talking about is because the PHP docs say "not recommended" & sect;. )

However, is it safe to use the direct MySQL set names query if we use prepared statements and other acceleration methods besides mysql_real_escape_string / mysqli::real_escape_string / mysqli_escape_string ?

Besides affecting the encoding of mysql_real_escape_string / mysqli::real_escape_string / mysqli_escape_string , is there a difference between set names vs mysql_set_charset / mysqli::set_charset ?

+12
security php mysql encoding libmysql
Oct. 27 '14 at 20:30
source share
4 answers

Calling SET NAMES on a connection is equivalent to calling set_charset unless you call get_charset and mysql_real_escape_string (and friends).




When you call set_charset , PHP does two things. First, it calls SET NAMES on the connection. Secondly, he remembers what encoding you set. This status information is later only used in the get_charset and mysql_real_escape_string (and friends) functions. Therefore, if you do not use these functions, you can consider two equivalents.

Skip the source code:

  • Userland functions mysql_set_charset and mysqli_set_charset call ...
  • The mysql_set_character_set engine mysql_set_character_set calls ...
  • The mysqlnd_set_character_set engine mysqlnd_set_character_set , which is defined as:

    #define mysqlnd_set_character_set(conn, cs) \ ((conn)->data)->m->set_charset((conn)->data, (cs)))

    and expanding to ...

  • MYSQLND_METHOD(mysqlnd_conn_data, set_charset) which contains the following code (numbered for discussion, these are not the actual line numbers of the source code):



  1 if (PASS == conn->m->local_tx_start(conn, this_func)) { 2 char * query; 3 size_t query_len = mnd_sprintf(&query, 0, "SET NAMES %s", csname); 4 5 if (FAIL == (ret = conn->m->query(conn, query, query_len))) { 6 php_error_docref(NULL, E_WARNING, "Error executing query"); 7 } else if (conn->error_info->error_no) { 8 ret = FAIL; 9 } else { 10 conn->charset = charset; 11 } 12 mnd_sprintf_free(query); 13 14 conn->m->local_tx_end(conn, this_func, ret); 15 } 



As you can see, PHP calls SET NAMES on the connection itself (line 3). PHP also tracks only the character set (line 10). The comments further discuss what happens with conn->charset , but suffice it to say that it ends only in get_charset and mysql_real_escape_string (and friends).

So, if you do not care about this state, and you agree to use neither get_charset nor mysql_real_escape_string , you can call SET NAMES on the connection itself without any harmful effect.

Aside, I have never done this, but it seems that compiling PHP using -DPHP_DEBUG=1 will allow for significant debugging using various DBG macros. This can be useful when looking at how your code goes through this block.

+6
Jun 21 '16 at 14:51
source share
— -

Two things need to be done (in this area):

  • Reset quotation marks (and other characters) before placing them in quotation marks. Otherwise, quotation marks will give you syntax errors.
  • Set the byte encoding in the client. This means that INSERTs / SELECTs will know how to change the bytes during write / read.

First you need to avoid the apostrophe and double quotation marks, since both of them are valid quotation marks for strings in MySQL syntax. Then the winning symbol itself needs to escape. These 3 characters are sufficient for required applications. However, if you are trying to avoid a BLOB (e.g. .jpg), various control characters can cause problems. You should probably convert to hex and then use UNHEX() to avoid problems. Note. Nothing is said about character sets here. If you are not dealing with BLOBs , you can get away with PHP addslashes() .

The second goal is to say: "This stream of bytes is encoded in this way (utf8 / latin1 / etc)." It is only used to convert between the CHARACTER SET column that is stored / retrieved and the required encoding in your client (PHP, etc.). In different languages, it is processed in various ways. For PHP:

  • mysql_* - Do not use this interface; it is outdated and will be removed soon.
  • mysqli_* - mysqli::set_charset(...)
  • PDO - new PDO('...;charset=UTF8', ...)

Does set_charset() do something with real_escape_string? I dont know. But that should not matter. SET NAMES clearly cannot, because it is a MySQL command, and knows nothing about PHP.

htmlentities() is another PHP function in this area. It turns 8-bit codes into & objects. This should not be used in MySQL. This would mask other issues. Use it only in certain situations related to HTML, and not with PHP or MySQL.

The only reasonable CHARACTER SETs to use today are ascii, latin1, utf8 and utf8mb4. They do not have “characters” in the “control” area. Sjis and several other character sets. This confusion over control characters may be the reason for the existence of real_escape_string.

Output:

As I can see, you need two mechanisms: one for escaping and one for setting the encoding in the client. They are divided.

If they are related to each other, the PHP manual did not provide any good reason to choose one method over another.

+4
Jun 18 '16 at 17:58
source share

mysql: the whole interface is deprecated, so don't use it at all (PHP 7 removes the interface).

mysqli (and PDO) prepared statements that use real_escape_string not needed (and not wanted). -> So, if you use only mysqli and only prepared statements: don't worry about how you set the encoding.

Since you care about safety: I see no reason not to use prepared instructions.

Once you use prepared mysqli statements, the only way forward is to use $mysqli->set_charset() , since you cannot just concatenate multiple sql statements on one line anymore.

Therefore, the question of knowing the difference in most academic and does not matter in real life.

In short:

  • mysql: do not use at all.

  • mysqli: use prepared statements and therefore the set_charset() method
    Also: you no longer need real_escape_string after using prepared statements.

  • or - of course - use PDO and its methods.

+1
Jun 20 '16 at 23:57
source share

SET NAMES ... - convenience alias:

Operator

A SET NAMES 'charset_name' equivalent to these three statements:

 SET character_set_client = charset_name; SET character_set_results = charset_name; SET character_set_connection = charset_name; 

Setting character_set_connection to charset_name also implicitly sets collation_connection for the default mapping for charset_name .

... which provides MySQL Server with all the text encoding information needed for the current connection. So far so good.

But PHP is also involved, and it won’t learn anything from here, because it’s basically a random user request. There are two things that PHP will not do for obvious performance reasons:

  • Scan all user requests sent to the server to detect SET NAMES calls.
  • Ask MySQL about the current values ​​of the directives involved every time something needs to be done.

In short: this method notifies the server, but not the client. However, the highlighted PHP functions do both.

+1
Jun 21 '16 at 14:52
source share



All Articles