UTF-8 error in Ruby

I am cleaning several websites, and in the end I hit a UTF-8 error that looks like this:

/usr/local/lib/ruby/gems/1.9.1/gems/dm-core-1.2.0/lib/dm-core/support/ext/blank.rb:19:in `=~': invalid byte sequence in UTF-8 (ArgumentError) 

Now I don’t care about websites being 100% accurate. Is there a way that I can take the page I get and cross out any encodings of the problems and then pass it inside my program?

I am using ruby 1.9.3p0 (2011-10-30 revision 33570) [x86_64-darwin11.2.0] if that matters.

Update:

 def self.blank?(value) return value.blank? if value.respond_to?(:blank?) case value when ::NilClass, ::FalseClass true when ::TrueClass, ::Numeric false when ::Array, ::Hash value.empty? when ::String value !~ /\S/ ###This is the line 19 that has the issue. else value.nil? || (value.respond_to?(:empty?) && value.empty?) end end end 

When I try to save the following line:

 What Happens in The Garage Tin Sign2. Γ―ΒΏΒ½ Γ―ΒΏΒ½ Newsletter Our monthly newsletter, 

He gives an error. He is on the page: http://www.stationbay.com/ . But the strange thing is that when I view it in my web browser, it does not show funny characters in the source.

What should I do next?

+4
source share
1 answer

The problem is that your string contains characters other than UTF-8, but it seems that UTF-8 encoding is enforced. The following short code demonstrates the problem:

 a = "\xff" a.force_encoding "utf-8" a.valid_encoding? # returns false a =~ /x/ # provokes ArgumentError: invalid byte sequence in UTF-8 

The best way to fix this is to apply the correct encoding from the very beginning. If this is not an option, you can use String#encode :

 a = "\xff" a.force_encoding "utf-8" a.valid_encoding? # returns false a.encode!("utf-8", "utf-8", :invalid => :replace) a.valid_encoding? # returns true now a ~= /x/ # works now 
+6
source

Source: https://habr.com/ru/post/1384484/


All Articles