Rails 3 - How to Handle PG An Incomplete Multibyte Character Error

In a Rails 3.2 application (Ruby 1.9.2), I get the following errors:

PGrror occurred in mobile_users # update:

incomplete multibyte character

These are Postgres bu errors. I get a similar SQLIte error when testing in dev and test modes.

The parameters causing this error are (auth token intentionally omitted)

* Parameters: {"mobile_user"=>{"quiz_id"=>"1", "auth"=>"xxx", "name"=>"Joaqu\xEDn"}, "action"=>"update", "controller"=>"mobile_users", "id"=>"1", "format"=>"mobile"} 

This comes in as an HTTP Put JSON request, and the update action associated with it is as follows

  # PUT /mobile_users/1 # PUT /mobile_users/1.xml def update @mobile_user = current_mobile_user @mobile_user.attributes = params[:mobile_user] respond_to do |format| if @mobile_user.save format.html { redirect_to(@mobile_user, :notice => 'Mobile user was successfully updated.') } format.json { head :ok } format.mobile { head :ok } format.xml { head :ok } else format.html { render :action => "edit" } format.json { render :json => @mobile_user.errors, :status => :unprocessable_entity } format.mobile { render :json => @mobile_user.errors, :status => :unprocessable_entity } format.xml { render :xml => @mobile_user.errors, :status => :unprocessable_entity } end end end 

The insult line in the above parameters is "Joaqu \ xEDn", which is absolutely correct. the fact is that I need to process all character sets from any language.

I assume that I will need to use the iconv library, but for this I will need to find the character set to convert to UTF8, and I do not know how to do this.

I also get an invalid byte sequence in UTF-8 for "name"=>"p\xEDa "

+2
source share
2 answers

It:

 "Joaqu\xEDn" 

is a coded version of "Joaquín" based on ISO-8859-1, so it is not valid UTF-8, and your databases have the right to complain about it. If possible, fix your mobile clients to use UTF-8 in JSON; if you cannot do this, you can correct the encoding as follows:

 params[:mobile_user][:name].force_encoding('iso-8859-1').encode!('utf-8') 

on server. The problem with committing to the server is that you have to guess what the incoming encoding is, and your hunch may be wrong. It is not possible to reliably guess the encoding for a particular string; there is rchardet , but it does not work with the latest versions of Ruby and it seems to have been abandoned; You can fix this stone to work with modern Ruby. There are several more guessing libraries, but they all seem to have been abandoned too.

JSON text is always, by definition , Unicode and UTF-8 encoded by default:

 3. Encoding JSON text SHALL be encoded in Unicode. The default encoding is UTF-8. 

Any clients that send you JSON that is not in UTF-8 are IMO broken, because almost everyone will assume that JSON will be UTF-8. Of course, there might be an encoding header somewhere that points to ISO 8859-1, or maybe the headers say UTF-8, although it's ISO 8859-1.

+5
source

I had the same problem with user data when parsing files and was resolved like this:

 require 'iconv' .... line = Iconv.conv('UTF-8//IGNORE', 'UTF-8', line) #now variable line has valid utf-8 data 

You can try to override the installer 'name' so that it separates non-utf8 characters:

 def name=(name) write_attribute(:name, Iconv.conv('UTF-8//IGNORE', 'UTF-8', name)) end 
+4
source

Source: https://habr.com/ru/post/1395765/


All Articles