How can I parse incomplete XML fragments using Perl XML :: Twig?

I am trying to extract data from log files in XML format. Since they are huge, I use XML :: Twig to extract the corresponding data from the buffer instead of the entire file (s)

Since this data is concatenaded from STDIN, XML is far from well formed. So often the parser stops with an error. How can I make the XML parser ignore errors and retrieve only the tags that interest me? Should I go back to regular regex analysis (start-tag - end-tag)?

+3
source share
2 answers

, , <message> , Perl XML .

0

<message></message> , , :

#!/usr/bin/perl

use strict; use warnings;

use XML::Simple;
use Data::Dumper;

my $in_message;
my $message;

LOGENTRY:
while ( my $line = <DATA> ) {
    while ( $line =~ /^<message/ .. $line =~ m{</message>$} ) {
        $message .= $line;
        next LOGENTRY;
    }
    if ( $message ) {
        process_message($message);
        $message = '';
    }
}

sub process_message {
    my ($message) = @_;

    my $xml = XMLin(
        $message,
        ForceArray => 1,
    );
    print Dumper $xml;
}

__DATA__
ldksj
lskdfj
lksd

sdfk

<message sender="1">Hi</message>

sdk
dkj

<message sender="2">Hi yourself!</message>

sd

:

$VAR1 = {
          'sender' => '1',
          'content' => 'Hi'
        };
$VAR1 = {
          'sender' => '2',
          'content' => 'Hi yourself!'
        };
+4

Source: https://habr.com/ru/post/1768610/


All Articles