How to remove comments from XML using Twig module

I use the XML :: Twig module to remove all comments from an XML file. The sample file may be -

<?xml version="1.0" encoding="UTF-8"?> <Node_A> node A content 1 <!-- One Line Comment A1--> <![CDATA[this portion within the two comments is being REMOVED which is not the intention]]> <!-- Two Line Comment Two Line Comment--> node A content 3 <!-- Two Line Comment Two Line Comment--> <![CDATA[this portion within the two comments is being REMOVED which is not the intention]]> <!-- Two Line Comment Two Line Comment--> <![CDATA[ this portion is fine]]> <Node_B> node B content <Node_C> node c content </Node_C> <!-- One Line Comment --> some data one <!-- Multi Line Comment Line 3Comment 1Line Comment 2Line Comment Line 5Comment Line Comment--> some data again two <!-- Multi Line Comment Line 3Comment Line 5Comment Line Comment--> few more </Node_B> </Node_A> 

I used the script as -

 #!/usr/bin/perl use strict; use warnings; use XML::Twig; my $infile = 'demo.xml'; my $twig = XML::Twig->new (comments => 'drop', pretty_print => 'indented')->parsefile($infile); $twig->print (); 

This script removes the "CDATA" part in two comments, this is not my intention. The way out is -

 <?xml version="1.0" encoding="UTF-8"?> <Node_A> node A content 1 <![CDATA[ this portion is fine]]><Node_B> node B content <Node_C> node c content </Node_C> some data one some data again two few more </Node_B></Node_A> 

What should I add in order to save the whole part of CDATA and other things as it is, just delete comments?

Thanks in advance.

+4
source share
1 answer

When I run your script with the published demo file demo.xml, I get the output:

 <?xml version="1.0" encoding="UTF-8"?> <Node_A> node A content 1 <![CDATA[this portion within the two comments is being REMOVED which is not the intention]]> node A content 3 <![CDATA[this portion within the two comments is being REMOVED which is not the intention]]><![CDATA[ this portion is fine]]><Node_B> node B content <Node_C> node c content </Node_C> some data one some data again two few more </Node_B></Node_A> 

Which looks fine to me. I suspect that you have an incorrect version of XML :: Twig (or XML :: Parser , on which this depends). I am using Perl 5.14.2, XML :: Twig 3.35 and XML :: Parser 2.41.

+4
source

Source: https://habr.com/ru/post/1381243/


All Articles