Foreign / accented characters in sql query

I am using Java and the Spring JdbcTemplate class to create a SQL query in Java that queries the Postgres database. However, I am having problems completing queries containing foreign / accented characters.

For example, (cropped) code:

JdbcTemplate select = new JdbcTemplate( postgresDatabase ); String query = "SELECT id FROM province WHERE name = 'Ontario';"; Integer id = select.queryForObject( query, Integer.class ); 

will retrieve the province id, but if I made name = 'Québec' , then the query could not return any results (this value is in the database, so the problem is not that it is missing).

I believe that the source of the problem is that the database I have to use is set to the default client encoding set to SQL_ASCII, which according to this prevents automatic character set conversion. (The Java encoding is set to "UTF-8", while I am told that the database uses "LATIN1" / "ISO-8859-1")

I managed to manually specify the encoding when the result sets contained values ​​with foreign characters as a solution to a previous problem with a similar character.

Example:

 String provinceName = new String ( resultSet.getBytes( "name" ), "ISO-8859-1" ); 

But now that foreign characters are part of the request itself, this approach has not been successful. (I suppose, since the request must be stored in String before it is executed in any case, breaking it into bytes, and then changing the encoding only confuses the characters further.)

Is there a way around this without having to change the database properties or restore it?

PostScript: I found this function in StackOverflow when creating the header, it didn’t seem to work (maybe I didn’t use it correctly, but even if it really worked, it seems that this may not be the best solution.):

Edit: I chose my own answer for this, as this will be what I'm using now; however, as mentioned in a comment below, I would be happy to take a look at other suggestions that might be better as long as I have access to the database.

+4
source share
3 answers

Hm, well, after flashing the postgreSQL documentation, I found a solution in the String Functions and Operators section.

I used the convert(string bytea, src_encoding name, dest_encoding name) function convert(string bytea, src_encoding name, dest_encoding name) and was able to get the province identifier for Quebec.

Ref.

 String query = "SELECT id FROM province WHERE name = convert( 'Québec', 'UTF-8', 'ISO-8859-1' );"; 
+2
source

If you are connecting to Java with UTF-8 encoding and the database is ISO-8859-1, then you should run this SQL command immediately after the initial connection to the database:

 SET client_encoding = 'UTF8'; 

PostgreSQL then interprets all the input as UTF-8, and then converts it to ISO-8859-1 on the server side. You should not do anything but this.

+3
source

In fact, if your database "SQL_ASCII" is encoded, it basically understands ASCII and nothing more. This means that the word “Québec” was stored “as provided”, which means “as provided in bytes” according to the encoding used by the tool that was currently processing the insertion or update of the sql code to the database. ”Therefore, when you are trying to select such values, you must use the same encoding, but you must know in advance what it is.

In this first question, you'll need a way to express that your request should use this encoding.

Let's say that it was saved with the encoding ISO-8859-1.

I'm not sure if this might work, but I would try something like this:

 String myReq = "SELECT id FROM province WHERE name = 'Québec';"; byte[] iso8859sequence = myReq.getBytes("ISO-8859-1"); String myReqAscii = new String(iso8859sequence, "US-ASCII"); Integer id = select.queryForObject( query, Integer.class ); 
0
source

Source: https://habr.com/ru/post/1303722/


All Articles