XML Encoding Fix

I have xml with an encoding tag set to 'utf-8'. But actually it is iso-8859-1.

Programmatically, how to define this in perl and python? and how do i decode with different encoding?

In perl I tried

$xml = decode('iso-8859-1',$file)

but it does not work.

+3
source share
3 answers

It is difficult to recognize that intelligibility is detected, since random binary data often represents valid strings in many encodings.

In Perl, the easiest way is to try to decode it as utf-8 and check for failures. (it only works like this: the desired western-language document containing utf-8 is almost always a valid iso-8859-1 document)

my $xml = eval { decode_utf8( $file, FB_CROAK ) };
if ( $@ ) { is_probably_iso-8859-1_instead }

, . , , , .

XML MIME-, Perl, , , .

XML, , , XML, , .

# assuming it on line 1:
$contents =~ s/.*/<?xml version="1.0" encoding="ISO-8859-1"?>/;
+4

, :

, .

raw_bytes UTF-8 , , UTF-8.

, , ISO-8859-1, , UTF-8 (, , ASCII, ISO-8859-1 UTF-8).

, XML, . :

<?xml version="1.0" encoding="ISO-8859-1"?>

, , ISO-8859-1, CP1252 ( Windows)?

+1
, , , .

, , XML, XML. . , 1 , , UTF-8; 2 - XML . .

+1
source

Source: https://habr.com/ru/post/1785081/


All Articles