Read XML from file to file while saving format

I use this perl code to read XML from a file, and then write to another file (my full script has code for adding attributes):

#!usr/bin/perl -w use strict; use XML::DOM; use XML::Simple; my $num_args = $#ARGV + 1; if ($num_args != 2) { print "\nUsage: ModifyXML.pl inputXML outputXML\n"; exit; } my $inputPath = $ARGV[0]; my $outputPath = $ARGV[1]; open(inputXML, "$inputPath") || die "Cannot open $inputPath \n"; my $parser = XML::DOM::Parser->new(); my $data = $parser->parsefile($inputPath) || die "Error parsing XML File"; open my $fh, '>:utf8', "$outputPath" or die "Can't open $outputPath for writing: $!\n"; $data->printToFileHandle($fh); close(inputXML); 

however, this does not preserve characters such as line breaks. For example, this XML:

 <?xml version="1.0" encoding="utf-8"?> <Test> <Notification Content="test1 testx &#xD;&#xA;test2&#xD;&#xA;test3&#xD;&#xA;" Type="Test1234"> </Notification> </Test> 

becomes the following:

 <?xml version="1.0" encoding="utf-8"?> <Test> <Notification Content="test1 testx test2 test3 " Type="Test1234"> </Notification> </Test> 

I suspect that I am not writing correctly.

+6
source share
2 answers

Use XML :: LibXML , for example. The main modules that are involved are XML :: LibXML :: Parser and XML :: LibXML :: DOM (among others). The return object is usually XML :: LibXML :: Document

 use warnings 'all'; use strict; use XML::LibXML; my $inputPath = 'with_encodings.xml'; my $outputPath = 'keep_encodings.xml'; my $reader = XML::LibXML->new(); my $doc = $reader->load_xml(location => $inputPath, no_blanks => 1); print $doc->toString(); my $state = $doc->toFile($outputPath); 

We should not create an object first, but we can directly say XML::LibXML->load_xml . I do this as an example, since after that you can use the methods on $reader to configure encodings (for example) before parsing, but outside the constructor.

This module is also much more convenient for processing.

XML :: Twig should also leave encodings, and also much better for processing.

+4
source

FYI, I was able to do this by switching to another XML parser. Now use XML :: LibXML.

The syntax is similar, except for "parse_file" instead of "parsefile", and instead of "printToFileHandle" you use "toFile" with the file name.

-one
source

Source: https://habr.com/ru/post/1012001/


All Articles