Clear Strange Ruby Encoding

I am currently playing a bit with couchdb.
I am trying to transfer some blog data from redis (key value store) to couchdb (key value store).
Seeing how I probably moved this data a gazillion times from and to various blogging mechanisms (everyone should have a hobby :)), it seems there is some snafus encoding. I use CouchREST to access CouchDB from ruby, and I get the following:

<JSON::GeneratorError: source sequence is illegal/malformed>

the problem seems to be part of the body_html object:

<Post:0x00000000e9ee18 @body_html="[.....]Wie Sie bereits wissen, m\xF6chte EUserv k\xFCnftig seine  [...]

These are assumed to be Umlauts ("möchte" and "künftig").

Any idea how to get rid of these problems? I tried some conversions using the encoding function ruby ​​1.9 or iconv before pasting, but so far have not received any luck :(

If I try, for example, converting this stuff to ISO-8859-1 using the .encode () ruby ​​1.9 method, this is what happens (different text, same problem):

#<Encoding::UndefinedConversionError: "\xC6\x92" from UTF-8 to ISO-8859-1>
+3
source share
1 answer

I am trying, for example, to convert this material to ISO-8859-1

To close. You really want to do it the other way around: you have ISO-8859-1 (*), you want UTF-8 (**). Thus, it str.encode('utf-8', 'iso-8859-1')will be more likely to do the trick.

*: Windows 1252, ISO-8859-1, - 0x80-0x9F, ISO-8859-1 . , 'cp1252'.

**: , , , . UTF-8 - , . ISO-8859-1/cp1252, , , , Ruby , , str.force_encoding('iso-8859-1').

+8

Source: https://habr.com/ru/post/1725472/


All Articles