Postgresql debugging for where 'A' <'a'

In a simple benchmark test in postgres 9.1 and 8.4, the following strange results come up.

 postgres=# select 1 one where 'A' < 'a'; one ----- (0 rows) // ..... I would have expected 1 row postgres=# select 1 one where 'A' < 'b'; one ----- 1 (1 row) // ...... this looks OK postgres=# select 1 one where 'A' = 'a'; one ----- (0 rows) // ...... This also looks OK postgres=# select 1 one where 'A' > 'a'; one ----- 1 (1 row) // ...... This is inconsistent with the above results 

The ascii value of "A" is 0x41, and "a" is 0x61, so a direct comparison of ascii values ​​should mean that "A" is less than "a", or if some kind of random magic, then at least A> b and Alocale, but again - however, my local one is set to the standard us_EN.utf8 setting using the standard Centos5 and Fedora16 settings with the same results.

Attaching the debugger to the postgres process, I was able to track that the problem arises from this:

 strcoll("A","a") returns 6; 

Where

 strcoll("A","b") returns -1; 

However, this can only be demonstrated inside the postgres process (for example, when connecting gdb), and an external program like the one below gives completely reasonable results.

 main() { char *a="a"; char *b="b"; char *A="A"; printf("%s\n",setlocale(2,"us_ENG.utf8")); printf("%d\n",strcoll(A,a)); printf("%d\n",strcoll(A,b)); printf("%d\n",strcoll(a,a)); printf("%d\n",strcoll(b,b)); printf("%d\n",strcoll(a,A)); printf("%d\n",strcoll(b,A)); printf("%d\n",strcoll(b,a)); printf("%d\n",strcoll(A,A)); } 

Question: Someone might think about what would cause strcoll to return bad values ​​and some suggestion on how to fix it, so the SQL sample will work correctly.

Update : I tried to recreate the database as initdb --locale=C , and "A" and "a" give the expected results there - however this does not explain why this fails in a database created as UTF-8.

+2
source share
1 answer

The order depends on your locale database, not the system locale. (Although it should be noted that PostgreSQL relies on the OS to provide specifics. More on the Postgres Wiki. )
ASCII value applies only to non-local "C" .

Take a look at your current settings:

 SELECT * FROM pg_settings WHERE name ~~ 'lc%'; 

In particular, the setting for LC_COLLATE . You also can:

 SHOW lc_collate; 

In PostgreSQL 9.1, you can change the appropriate sorting for each statement. Try:

 SELECT 1 AS one WHERE 'A' < 'a' COLLATE "C"; 

In older versions, you (mostly) were stuck with the value for LC_COLLATE that you selected when creating the database cluster.

+4
source

Source: https://habr.com/ru/post/953521/


All Articles