The approach shown in the question should (basically) work very well in MySQL for the following reasons:
Mapping (not to be confused with encoding) is a set or rules that determine how to sort and compare characters commonly used for replication at the database level of user expectations in terms of culture (if I'm looking for cafe I also expect to find a café ).
Sorting plays an important rule over unique constraints, because it establishes a definition of unique.
Binary mappings are specifically designed to ignore cultural rules and work at the byte level, so utf8mb4_bin is the right choice here.
MySQL allows you to establish a combination of coding and collation with column-level granularity.
If there is no sorting in the column definition, it will use a level one table.
If there is no sorting in the table definition, it will use database level one.
If there is no sorting in the database definition, it will use server level one.
It is also worth noting that MySQL will convert between encodings transparently if:
- Correct connection coding established.
- The conversion is physically possible (for example, all source characters also refer to target encoding).
For this last reason, VARBINARY may not be the best choice for a column that is still text because it opens the door to get the café stored from a connection configured to use ISO-8859-1 and cannot retrieve it correctly from the connection configured to use UTF-8.
Side Note: The table definition shown may cause the following error:
ERROR 1071 (42000): The specified key was too long; The maximum key length is 767 bytes.
Indexes can have a relatively small maximum size. From docs :
If innodb_large_prefix is enabled (by default), the index key prefix limit is 3072 bytes for InnoDB tables that use DYNAMIC or COMPRESSED row format. If innodb_large_prefix is disabled, the limit key index prefix is 767 bytes for tables of any row format.
innodb_large_prefix is deprecated and will be removed in a future release. innodb_large_prefix was introduced in MySQL 5.5 to disable large key-key prefixes for compatibility with earlier versions of InnoDB that do not support large-index prefixes.
The index key prefix length limit is 767 bytes for InnoDB tables that use the REDUNDANT or COMPACT format. For example, you can click this limit with a column prefix index of more than 255 characters per TEXT or VARCHAR, assuming a character set of utf8mb3 and a maximum of 3 bytes for each character.
Attempting to use an index key prefix length that exceeds the limit returns an error. To avoid such errors in replication configurations, do not enable innodb_large_prefix on the master device, if it also cannot be activated on slaves.
Since utf8_mb8 allocates 4 bytes per character, the 767 limit will be full of only 192 characters.
We have one more problem:
mysql> CREATE TABLE `dummy` ( -> `key` varchar(191) COLLATE utf8mb4_bin NOT NULL, -> UNIQUE KEY `key` (`key`) -> ) -> ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_bin; Query OK, 0 rows affected (0.01 sec) mysql> INSERT INTO `dummy` (`key`) VALUES ('one'); Query OK, 1 row affected (0.00 sec) mysql> INSERT INTO `dummy` (`key`) VALUES ('one '); ERROR 1062 (23000): Duplicate entry 'one ' for key 'key'
Sorry?
mysql> INSERT INTO `dummy` (`key`) VALUES ('One'); Query OK, 1 row affected (0.00 sec) mysql> INSERT INTO `dummy` (`key`) VALUES ('öne'); Query OK, 1 row affected (0.00 sec) mysql> SELECT * FROM `dummy`; +-----+ | key | +-----+ | One | | one | | öne | +-----+ 3 rows in set (0.00 sec)
This last issue is the interesting subtlety of MySQL mappings. From docs :
All MySQL mappings are of type PADSPACE. This means that all CHAR, VARCHAR, and TEXT values in MySQL are compared without any trailing spaces . The “comparison” in this context does not include the LIKE-template operator for which trailing spaces are significant
[...] For cases where the characters of the closing pad are separated or comparisons ignore them, if the column has an index that requires a unique value, inserting into the column values that differ only in the number of characters at the end of the line will lead to an error with a duplicate key .
I would dare say that the VARBINARY type is the only way to overcome this ...