Ordering non-latin characters in a database with "order by"

I just found the weird behavior of the order by database. In comparing strings, I expected some characters, such as "[" and "_", to be larger than Latin characters / digits, such as "I" or "2", given their orders in the ASCII table. However, the sort results from the database clause are "sorted" different from my expectation. Here is my test:

SQLite Version 3.6.23
Enter ".help" for instructions
Enter SQL queries terminated by a ";"
sqlite> create table products (name varchar (10));
sqlite> paste into product values ​​('ipod');
sqlite> paste into product values ​​('iphone');
sqlite> paste into product values ​​('[apple]');
sqlite> paste into product values ​​('_ ipad');
sqlite> select * from the order of goods by name asc;
[Apple]
_ipad
iphone
Place a bet
Strike>

select * from products order by name asc; name ... [ B@ _ref 123 1ab ... 

This behavior is different than comparing Java strings (which took me some time to find this problem). I can verify this in both SQLite 3.6.23 and Microsoft SQL Server 2005. I did some search on the Internet but cannot find the related documentation. Maybe someone shed light on me? Is this the SQL standard? Where can I find information about this? Thanks in advance.

+4
source share
3 answers

The concept of comparing and arranging characters in a database is called matching .

How strings are stored depends on the mapping that is usually set in the properties of a server, client, or session.

In MySQL :

 SELECT * FROM ( SELECT 'a' AS str UNION ALL SELECT 'A' AS str UNION ALL SELECT 'b' AS str UNION ALL SELECT 'B' AS str ) q ORDER BY str COLLATE UTF8_BIN -- 'A' 'B' 'a' 'b' 

and

 SELECT * FROM ( SELECT 'a' AS str UNION ALL SELECT 'A' AS str UNION ALL SELECT 'b' AS str UNION ALL SELECT 'B' AS str ) q ORDER BY str COLLATE UTF8_GENERAL_CI -- 'a' 'A' 'b' 'B' 

UTF8_BIN sorts characters according to their Unicode. Caps have lower Unicode and therefore go first.

UTF8_GENERAL_CI sorts characters according to their alphabetical position, not counting the case.

Comparing indexes is also important for indexes, since indexes are highly dependent on sorting and comparison rules.

+2
source

An important keyword in this case is 'collation' . I have no experience with SQLite, but you can expect it to be similar to other database engines, since you can define sorting to be used for entire databases, individual tables, for each connection, etc.

Check the database documentation for the parameters available to you.

+1
source

ASCII codes for lowercase characters such as "i" are greater than for "[" and "_":

 'i': 105 '[': 91 '_': 95 

However, try inserting uppercase characters, for example. try with "ipod" or "iphone", they will become up to "_" and "[" with binary sorting by default.

0
source

Source: https://habr.com/ru/post/1305099/


All Articles