How to use internal / external encoding when importing a YAML file?

How to download a YAML file regardless of its encoding?

My YAML file can be encoded in UTF-8 or ANSI (which is what Notepad ++ calls on it, I think it is Windows-1252):

:key1:
  :key2: "ä"

utf8.ymlencoded in UTF-8, ansi.ymlencoded in ANSI. I upload files as follows:

# encoding: utf-8

Encoding.default_internal = "utf-8"

utf8_load      = YAML::load(File.open('utf8.yml'))
utf8_load_file = YAML::load_file('utf8.yml')
ansi_load      = YAML::load(File.open('ansi.yml'))
ansi_load_file = YAML::load_file('ansi.yml')

Ruby seems to incorrectly recognize the encoding:

utf8_load      [:key1][:key2].encoding  #=> "UTF-8"
utf8_load_file [:key1][:key2].encoding  #=> "UTF-8"
ansi_load      [:key1][:key2].encoding  #=> "UTF-8"
ansi_load_file [:key1][:key2].encoding  #=> "UTF-8"

because the bytes do not match:

utf8_load      [:key1][:key2].bytes  #=> [195, 164]
utf8_load_file [:key1][:key2].bytes  #=> [195, 164]
ansi_load      [:key1][:key2].bytes  #=> [239, 191, 189]
ansi_load_file [:key1][:key2].bytes  #=> [239, 191, 189]

If I missed Encoding.default_internal = "utf-8", the bytes also differ:

utf8_load      [:key1][:key2].bytes  #=> [195, 131, 194, 164]
utf8_load_file [:key1][:key2].bytes  #=> [195, 164]
ansi_load      [:key1][:key2].bytes  #=> [195, 164]
ansi_load_file [:key1][:key2].bytes  #=> [239, 191, 189]
  • What happens when I do not install default_internalin UTF-8?
  • What encodings do the strings have in both examples?
  • How to upload a file even if I don’t know its encoding?
+4
source share
2

, YAML UTF-8 (, , UTF-16). YAML . , , YAML - Unicode.

  • , default_internal utf-8?

Encoding.default_internal , , , , Encoding.default_internal, . Rails, , UTF-8. , Encoding.default_internal UTF-8, UTF-8 .

Encoding.default_internal - nil, , , Encoding.default_internal , , , , , .

- , , "WINDOWS-1252", Ruby WINDOWS-1252, File.open, , , YAML::load WINDOWS-1252. , , Encoding.default_internal .

. Ruby docs :

" :: default_internal Ruby, , , , . ruby ​​-E Ruby default_internal."

: http://ruby-doc.org/core-1.9.3/Encoding.html#method-c-default_internal

  1. ?

. , , -, .

, : "ÉGÉìÉRÅ[ÉfÉBÉìÉOÇÕìÔǵÇ≠ǻǢ". UTF-8, , , , , , , , , - . , UTF-8, , , - "ÉGÉìÉRÅ[ÉfÉBÉìÉOÇÕìÔǵÇ≠ǻǢ", , , !

, , - .

. ...

  1. , ?

, . . , , .

, , . , . Ruby gem Charlock Holmes, , ICU ( MRI).

Ruby string.encoding - , . , , , ... .

Ruby , . Encoding.default_external (Encoding.default_external UTF-8 ASCII-8BIT, , , ), File.open: File.open("something", "r:UTF-8" , , , File.open("something", "r", :encoding => "UTF-8"). , . Ruby , , , .

reddit/r/ruby, , , :

, : http://kunststube.net/encoding/

YAML, , , , UTF-8. , . , , , , , , . , UTF-8, YAML UTF-8.

+2

YAML 5.1. ":

, YAML . C0 # x0- # x1F ( TAB # x9, LF #xA CR #xD, ), DEL # x7F, C1 # x80- # x9F ( NEL # x85, ), # xD800- # xDFFF, #xFFFE #xFFFF.

, Windows-1252 ISO-8859-1 , , , . Windows " C1 × x80- # x9F" , , YAML, , YAML . , "ä" .

YAML . . , , , , , . , .

Ruby UTF-8, YAML . "5.2. ":

YAML UTF-8 UTF-16. JSON UTF-32.

, , . ASCII. (# x00) .

, UTF-8, 16 32, Ruby UTF-8. , . UTF-16 32, , Ruby YAML, .

+4

Source: https://habr.com/ru/post/1608731/


All Articles