UTF-8 will not persist in Hibernate + MySQL

I am trying to save some values ​​in a MySQL database using Hibernate, but most Lithuanian characters will not be saved, including ąĄ čČ ęĘ ėĖ įĮ ųŲ ūŪ (they are saved as ? ), However šŠ žŽ .

If I insert manually, these values ​​will be saved correctly, so the problem is most likely in the Hibernate configuration.

What I have tried so far:

 hibernate.charset=UTF-8 hibernate.character_encoding=UTF-8 hibernate.use_unicode=true --------- properties.put(PROPERTY_NAME_HIBERNATE_USE_UNICODE, env.getRequiredProperty(PROPERTY_NAME_HIBERNATE_USE_UNICODE)); properties.put(PROPERTY_NAME_HIBERNATE_CHARSET, env.getRequiredProperty(PROPERTY_NAME_HIBERNATE_CHARSET)); properties .put(PROPERTY_NAME_HIBERNATE_CHARACTER_ENCODING, env.getRequiredProperty(PROPERTY_NAME_HIBERNATE_CHARACTER_ENCODING)); --------- private void registerCharachterEncodingFilter(ServletContext aContext) { CharacterEncodingFilter cef = new CharacterEncodingFilter(); cef.setForceEncoding(true); cef.setEncoding("UTF-8"); aContext.addFilter("charachterEncodingFilter", cef) .addMappingForUrlPatterns(null, true, "/*"); } 

As described here

I tried adding ?useUnicode=true&characterEncoding=utf-8 to the db connection url.

As described here

I guaranteed that my db is set to UTF-8 encoding. phpmyadmin > information_schema > schemata

 def db_name utf8 utf8_lithuanian_ci NULL 

This is how I save in db:

 //Controller buildingService.addBuildings(schema.getBuildings()); List<Building> buildings = buildingService.getBuildings(); System.out.println("-----------"); for (Building b : schema.getBuildings()) { System.out.println(b.toString()); } System.out.println("-----------"); for (Building b : buildings) { System.out.println(b.toString()); } System.out.println("-----------"); //Service: @Override public void addBuildings(List<Building> buildings) { for (Building b : buildings) { getCurrentSession().saveOrUpdate(b); } } 

The first set of println contains all Lithuanian characters, and the second replaces most ?

EDIT: Added Information

 insert into buildings values (11,'ąĄčČęĘ', 'asda'); select short, hex(short) from buildings; //Šalt. was inserted via hibernate //letters are properly displayed: ąĄčČęĘ | C485C484C48DC48CC499C498 MIF Šalt. | 4D494620C5A0616C742E select address, hex(address) from buildings; Šaltini? <...> | C5A0616C74696E693F20672E2031412C2056696C6E697573 //should contain "ų" -------- show create table buildings; buildings | CREATE TABLE `buildings` ( `id` int(11) NOT NULL, `short` varchar(255) COLLATE utf8_lithuanian_ci DEFAULT NULL, `address` varchar(255) COLLATE utf8_lithuanian_ci DEFAULT NULL, PRIMARY KEY (`id`) ) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_lithuanian_ci 

EDIT: I did not find the right solution, so I came up with a workaround. I ended up hiding / undoing characters, storing them like this: \uXXXX .

+6
source share
3 answers

Make sure they were saved correctly. Please do SELECT col, HEX(col) ... to get some cell with Lithuanian characters. A correctly saved ą will show C485 . The rest should show different hexadecimal values ​​of C4xx or C5xx. 3F - ? .

But, more importantly, 4 characters show. Š must be C5A0 if it is stored correctly as utf8. However, I suspect you will see 8A , implying that the column in the table is indeed declared as CHARACTER SET latin1 . (4 characters appear in the first column of my charset blog ).

Do a SHOW CREATE TABLE to find out how the column is defined. If he says latin1 , then the problem is with the definition of the table, and you probably should start over.

+3
source

You must make sure that each component involved in data writing uses UTF-8 explicitly.

  • If you enter values ​​through a browser, make sure that the page displaying the results with the following Content-Type: text/html; charset=utf-8 heading Content-Type: text/html; charset=utf-8 Content-Type: text/html; charset=utf-8 .

  • The input form is defined as follows

    <form action="submit" accept-charset="UTF-8">...</form> .

  • If you are creating String objects from an array of bytes, make sure you explicitly specify a Charset in the constructor.

  • If your recording comes from a text file, this file must be UTF-8 encoded.

  • If it is hard-coded directly in your code, then the source must be UTF-8 .

0
source

The fact that your database contains the correct UTF-8 (two or more bytes for a special letter) is encouraging.

Should you get one single ? for a special letter, an attempt was made to convert UTF-8 to some encoding that does not contain these letters. And it looks like . Letters that have been converted correctly are in the range of ISO-8859-1 or Windows-1252 . Others do not. Now ISO-88591-1 aka Latin-1 is the default default HTTP encoding in the EE java server. You might want to do the following:

 response.setCharacterEncoding("UTF-8"); 

Now one problem with System.out.println is that it uses the default system encoding. More interesting is the entry to the file with the registrar. Or debugging and checking string and char array.

That the circuit works, it seems to work, it may be that the lines of the circuit are built directly from the Java source, and the encoding of the editor and that of the javac compiler are different. This can be verified by u-escaping string literals in java: "\u0105" instead of "ą" .

Make a unit test that writes and reads from the database.

0
source

Source: https://habr.com/ru/post/984390/


All Articles