Encoding issue with JDBC and MySQL

I collect data from RSS feeds, disinfect it and save it in a database. I use java, tidy, MySQL and JDBC.

Steps:

  • I take RSS feeds. This is normal.
  • I sanitize html with a neat one. Here is one transformation. Tidy automatically converts strings like "So it & # 8217; s unlikely" to "So unlikely."
  • I save this row in a table

MySQL schema

CREATE TABLE IF NOT EXISTS `rss_item_safe_texts` ( `id` int(10) unsigned NOT NULL, `title` varchar(1000) NOT NULL, `link` varchar(255) NOT NULL, `description` mediumtext NOT NULL, PRIMARY KEY (`id`) ) ENGINE=InnoDB DEFAULT CHARSET=utf8; 

JDBC Connection URL

 connUrl = "jdbc:mysql://" + host + "/" + database + "?user=" + username + "&password=" + password + "&useUnicode=true&characterEncoding=UTF-8"; 

Java code

 PreparedStatement updateSafeTextSt = conn.prepareStatement("UPDATE `rss_item_safe_texts` SET `title` = ?, `link` = ?, `description` = ? WHERE `id` = ?"); updateSafeTextSt.setString(1, EscapingUtils.escapeXssInjection(title)); updateSafeTextSt.setString(2, link); updateSafeTextSt.setString(3, EscapingUtils.escapeXssInjection(description)); updateSafeTextSt.setInt(4, itemId); updateSafeTextSt.execute(); updateSafeTextSt.close(); 

As a result, I see broken characters in the database, such as "So it" ? unlikely. "The same thing I see then is the output of text on a web page (utf-8 page).

+4
source share
1 answer

Do not forget that there are many other places where the encoding can be set in different ways. Check, for example, if your database / table / column has the correct encodings. Also, I usually install everything I can in utf8 in MySQL:

 mysql> show variables like '%char%'; +--------------------------+----------------------------+ | Variable_name | Value | +--------------------------+----------------------------+ | character_set_client | utf8 | | character_set_connection | utf8 | | character_set_database | utf8 | | character_set_filesystem | binary | | character_set_results | utf8 | | character_set_server | utf8 | | character_set_system | utf8 | | character_sets_dir | /usr/share/mysql/charsets/ | +--------------------------+----------------------------+ 
+5
source

Source: https://habr.com/ru/post/1309750/


All Articles