In a simple benchmark test in postgres 9.1 and 8.4, the following strange results come up.
postgres=# select 1 one where 'A' < 'a'; one ----- (0 rows) // ..... I would have expected 1 row postgres=# select 1 one where 'A' < 'b'; one ----- 1 (1 row) // ...... this looks OK postgres=# select 1 one where 'A' = 'a'; one ----- (0 rows) // ...... This also looks OK postgres=# select 1 one where 'A' > 'a'; one ----- 1 (1 row) // ...... This is inconsistent with the above results
The ascii value of "A" is 0x41, and "a" is 0x61, so a direct comparison of ascii values ββshould mean that "A" is less than "a", or if some kind of random magic, then at least A> b and Alocale, but again - however, my local one is set to the standard us_EN.utf8 setting using the standard Centos5 and Fedora16 settings with the same results.
Attaching the debugger to the postgres process, I was able to track that the problem arises from this:
strcoll("A","a") returns 6;
Where
strcoll("A","b") returns -1;
However, this can only be demonstrated inside the postgres process (for example, when connecting gdb), and an external program like the one below gives completely reasonable results.
main() { char *a="a"; char *b="b"; char *A="A"; printf("%s\n",setlocale(2,"us_ENG.utf8")); printf("%d\n",strcoll(A,a)); printf("%d\n",strcoll(A,b)); printf("%d\n",strcoll(a,a)); printf("%d\n",strcoll(b,b)); printf("%d\n",strcoll(a,A)); printf("%d\n",strcoll(b,A)); printf("%d\n",strcoll(b,a)); printf("%d\n",strcoll(A,A)); }
Question: Someone might think about what would cause strcoll to return bad values ββand some suggestion on how to fix it, so the SQL sample will work correctly.
Update : I tried to recreate the database as initdb --locale=C , and "A" and "a" give the expected results there - however this does not explain why this fails in a database created as UTF-8.