Convert file encoding with Rebol 3

I want to use Rebol 3 to read a file in Latin1 and convert it to UTF-8. Is there a built-in function that I can use, or some kind of external library? Where can I find him?

+4
source share
4 answers

Rebol has invalid-utf function ? , which checks for a binary value for a byte that is not part of a valid UTF-8 sequence. We can just loop around until we find and replace them all, and then convert our binary value to a string:

latin1-to-utf8: function [binary [binary!]][
    mark: :binary
    while [mark: invalid-utf? mark][
        change/part mark to char! mark/1 1
    ]
    to string! binary
]

. , :

latin1-to-utf8: function [binary [binary!]][
    mark: :binary
    to string! rejoin collect [
        while [mark: invalid-utf? binary][
            keep copy/part binary mark  ; keeps the portion up to the bad byte
            keep to char! mark/1        ; converts the bad byte to good bytes
            binary: next mark           ; set the series beyond the bad byte
        ]
        keep binary                     ; keep whatever is remaining
    ]
]

: wee Rebmu - rebmu/args snippet #{DECAFBAD}, snippet:

; modifying
IUgetLOAD"invalid-utf?"MaWT[MiuM][MisMtcTKm]tsA

; copying
IUgetLOAD"invalid-utf?"MaTSrjCT[wt[MiuA][kp copy/partAmKPtcFm AnxM]kpA]
+4

, , , .

latin1-to-utf8: func [
    "Transcodes a Latin-1 encoded string to UTF-8"
    bin [binary!] "Bytes of Latin-1 data"
] [
    to binary! head collect/into [
        foreach b bin [
            keep to char! b
        ]
    ] make string! length? bin
]

Latin-1, , Unicode. , , b .

:

  • collect . collect/into . , .
  • , .
  • Rebol , .
  • , .

, , , . , .

UTF-8, Rebol, to binary! . , , to char! , .

+4

, . Latin-1 UTF-8, Rebol 3 a :

latin1-to-utf8: func [
    "Transcodes a Latin-1 encoded string to UTF-8"
    bin [binary!] "Bytes of Latin-1 data"
] [
    to-binary collect [foreach b bin [keep to-char b]]
] 

. , . ( , . .)

: Incorporated @BrianH " -1 Unicode", (, , ). . , , . @BrianH .

+3
latin1-to-utf8: func [
    "Transcodes bin as a Latin-1 encoded string to UTF-8"
    bin [binary!] "Bytes of Latin-1 data"
    /local t
] [
    t: make string! length? bin
    foreach b bin [append t to char! b ]
    t
]
+1
source

Source: https://habr.com/ru/post/1526452/


All Articles