Clear contents in xml brackets in all files in the Windows directory tree using Strawberry Perl and a twig

I want to clear all content hosted inside <loot> </loot> elements in XML files in a directory tree. I am using Strawberry Perl for Windows 64 bit.

For example, this XML file:

 <?xml version="1.0" encoding="UTF-8"?> <monster name="Dragon"/> <health="10000"/> <immunities> <immunity fire="1"/> </immunities> <loot> <item id="1"/> <item id="3"/> <inside> <item id="6"/> </inside> </item> </loot> 

The modified file should look:

 <?xml version="1.0" encoding="UTF-8"?> <monster name="Dragon"/> <health="10000"/> <immunities> <immunity fire="1"/> </immunities> <loot> </loot> 

I have this code:

 #!/usr/bin/perl use warnings; use strict; use File::Find::Rule; use XML::Twig; sub delete_loot { my ( $twig, $loot ) = @_; foreach my $loot_entry ( $loot -> children ) { $loot_entry -> delete; } $twig -> flush; } my $twig = XML::Twig -> new ( pretty_print => 'indented', twig_handlers => { 'loot' => \&delete_loot } ); foreach my $file ( File::Find::Rule -> file() -> name ( '*.xml' ) -> in ( 'C:\Users\PIO\Documents\serv\monsters' ) ) { print "Processing $file\n"; $twig -> parsefile_inplace($file); } 

But it only correctly edits the first file that it encounters, and the rest of the files remain clear (0 kb clear files)

+5
source share
2 answers

XML :: Twig doc says "Multiple branches are poorly supported."

If you look at the state of a twig object (for example, using Data :: Dumper), you will see a strong difference between the first and subsequent runs. It seems that he believes that this has already completely turned red (which is true, since there was a full flash during the first launch). There is probably nothing more to print for subsequent files, and the file ends up empty.

Recreating the twig object in each loop worked for me:

 #!/usr/bin/perl use warnings; use strict; use File::Find::Rule; use XML::Twig; sub delete_loot { my ( $twig, $loot ) = @_; foreach my $loot_entry ( $loot -> children ) { $loot_entry -> delete; } } foreach my $file ( File::Find::Rule -> file() -> name ( '*.xml' ) -> in ( '/home/dabi/tmp' ) ) { print "Processing $file\n"; my $twig = XML::Twig -> new ( pretty_print => 'indented', twig_handlers => { loot => \&delete_loot, } ); $twig -> parsefile($file); $twig -> print_to_file($file); } 

In addition, I had to change the structure of the XML file to process it:

 <?xml version="1.0" encoding="UTF-8"?> <monster name="Dragon"> <health value="10000"/> <immunities> <immunity fire="1"/> </immunities> <loot> <item id="1"/> <item id="3"> <inside> <item id="6"/> </inside> </item> </loot> </monster> 
+3
source

Note When changing flush to print code in the question works for me (with valid XML).

However, I still recommend any of the versions below. Tested by two groups of valid XML files.


When XML::Twig->new(...) installed first, and then the files loop and process, I get the same behavior. The first file is processed correctly, the rest are completely closed. Edit When flush is replaced with print , the code shown actually works (with the correct XML files). However, I still suggest one of the versions below, since XML::Twig simply does not support multiple files.

The reason may relate to new as a class method. However, I do not understand why this should affect the processing of multiple files. The callback is set outside of the loop, but I tested it with reinstallation for each file, and that doesn't help.

Finally, flush -ing is not required as long as it clearly hurts here, clearing the state (which was created by the class method new ). This does not affect the code below, but it is still replaced with print .

Then just do everything in a loop. Simple version

 use strict; use warnings; use File::Find::Rule; use XML::Twig; my @files = File::Find::Rule->file->name('*.xml')->in('...'); foreach my $file (@files) { print "Processing $file\n"; my $t = XML::Twig->new( pretty_print => 'indented', twig_handlers => { loot => \&clear_elt }, ); $t->parsefile_inplace($file)->print; } sub clear_elt { my ($t, $elt) = @_; my $elt_name = $elt->name; # get the name my $parent = $elt->parent; # fetch the parent $elt->delete; # remove altogether $parent->insert_new_elt($elt_name, ''); # add it back empty } 

The callback code is simplified to remove the entire item and then add it back empty. Note that the substring does not require the element name to be hardcoded. Thus, it can be used to remove any item.

We can avoid calling new in the loop with another method of the nparse class.

 my $t = XML::Twig->new( pretty_print => 'indented' ); foreach my $file (@files) { print "Processing $file\n"; my $tobj = XML::Twig->nparse( twig_handlers => { loot => \&clear_elt }, $file ); $tobj->parsefile_inplace($file)->print; } # the sub clear_elt() same as above 

We must first call the constructor new , even if it is not used directly in the loop.


Note that calling new before the loop without twig_handlers , and then setting the handlers inside

 $t->setTwigHandlers(loot => sub { ... }); 

Does not help. We still get only the first file processed correctly.

+1
source

Source: https://habr.com/ru/post/1262172/


All Articles