is_sane_utf8
does not do what you think. You must pass the strings you decrypted. I am not sure what this is, but it is not the right tool. If you want to check if the UTF-8 string is correct, you can use
ok(eval { decode_utf8($string, Encode::FB_CROAK | Encode::LEAVE_SRC); 1 }, '$string is valid UTF-8');
To show that JSON :: XS is correct, look at the sequence is_sane_utf8
.
+--------------------- Start of two byte sequence | +---------------- Not zero (good) | | +---------- Continuation byte indicator (good) | | | vvv C2 AB = [110]00010 [10]101011 00010 101011 = 000 1010 1011 = U+00AB = «
The following shows that JSON :: XS produces the same output as Encode.pm:
use utf8; use 5.18.0; use JSON::XS; use Encode; foreach my $string ('Deliver «French Bread»', '日本国') { my $hashref = { value => $string }; say(sprintf("Input: U+%v04X", $string)); say(sprintf("UTF-8 of input: %v02X", encode_utf8($string))); my $json = encode_json($hashref); say(sprintf("JSON: %v02X", $json)); say(""); }
Output (with spaces added):
Input: U+0044.0065.006C.0069.0076.0065.0072.0020.00AB.0046.0072.0065.006E.0063.0068.0020.0042.0072.0065.0061.0064.00BB UTF-8 of input: 44.65.6C.69.76.65.72.20.C2.AB.46.72.65.6E.63.68.20.42.72.65.61.64.C2.BB JSON: 7B.22.76.61.6C.75.65.22.3A.22.44.65.6C.69.76.65.72.20.C2.AB.46.72.65.6E.63.68.20.42.72.65.61.64.C2.BB.22.7D Input: U+65E5.672C.56FD UTF-8 of input: E6.97.A5.E6.9C.AC.E5.9B.BD JSON: 7B.22.76.61.6C.75.65.22.3A.22.E6.97.A5.E6.9C.AC.E5.9B.BD.22.7D
source share