Perl: delete multiple repeat lines where a specific criterion is met

Question

Perl: delete multiple repeat lines where a specific criterion is met

I have data that looks below, the actual file lasts thousands of lines.

Event_time Cease_time Object_of_reference -------------------------- -------------------------- ---------------------------------------------------------------------------------- Apr 5 2010 5:54PM NULL SubNetwork=ONRM_RootMo,SubNetwork=AXE,ManagedElement=BSJN1,BssFunction= BSS_ManagedFunction,BtsSiteMgr=LUGALAMBO_900 Apr 5 2010 5:55PM Apr 5 2010 6:43PM SubNetwork=ONRM_RootMo,SubNetwork=AXE,ManagedElement=BSJN1,BssFunction= BSS_ManagedFunction,BtsSiteMgr=LUGALAMBO_900 Apr 5 2010 5:58PM NULL SubNetwork=ONRM_RootMo,SubNetwork=AXE,ManagedElement=BSCC1,BssFunction= BSS_ManagedFunction,BtsSiteMgr=BULAGA Apr 5 2010 5:58PM Apr 5 2010 6:01PM SubNetwork=ONRM_RootMo,SubNetwork=AXE,ManagedElement=BSCC1,BssFunction= BSS_ManagedFunction,BtsSiteMgr=BULAGA Apr 5 2010 6:01PM NULL SubNetwork=ONRM_RootMo,SubNetwork=AXE,ManagedElement=BSCC1,BssFunction= BSS_ManagedFunction,BtsSiteMgr=BULAGA Apr 5 2010 6:03PM NULL SubNetwork=ONRM_RootMo,SubNetwork=AXE,ManagedElement=BSJN1,BssFunction= BSS_ManagedFunction,BtsSiteMgr=KAPKWAI_900 Apr 5 2010 6:03PM Apr 5 2010 6:04PM SubNetwork=ONRM_RootMo,SubNetwork=AXE,ManagedElement=BSJN1,BssFunction= BSS_ManagedFunction,BtsSiteMgr=KAPKWAI_900 Apr 5 2010 6:04PM NULL SubNetwork=ONRM_RootMo,SubNetwork=AXE,ManagedElement=BSJN1,BssFunction= BSS_ManagedFunction,BtsSiteMgr=KAPKWAI_900 Apr 5 2010 6:03PM Apr 5 2010 6:03PM SubNetwork=ONRM_RootMo,SubNetwork=AXE,ManagedElement=BSCC1,BssFunction= BSS_ManagedFunction,BtsSiteMgr=BULAGA Apr 5 2010 6:03PM NULL SubNetwork=ONRM_RootMo,SubNetwork=AXE,ManagedElement=BSCC1,BssFunction= BSS_ManagedFunction,BtsSiteMgr=BULAGA Apr 5 2010 6:03PM Apr 5 2010 7:01PM SubNetwork=ONRM_RootMo,SubNetwork=AXE,ManagedElement=BSCC1,BssFunction= BSS_ManagedFunction,BtsSiteMgr=BULAGA

As you can see, each file has a header that describes what the various fields mean (start time of the event, time to end the event, affected item). The title is followed by a few dashes.

My problem is that in the data you see several records in which the termination time is NULL. The event is still active. All such entries should go ie for each element, where the alarm end time is NULL, the start time, the end time (in this case NULL) and the actual element must be deleted from the file.

The rest of the data should also contain all the text starting with the word SubNetwork to BtsSiteMgr =. Along with headings and dashes.

The final output should look like this:

  Apr 5 2010 5:55PM Apr 5 2010 6:43PM LUGALAMBO_900 Apr 5 2010 5:58PM Apr 5 2010 6:01PM BULAGA Apr 5 2010 6:03PM Apr 5 2010 6:04PM KAPKWAI_900 Apr 5 2010 6:03PM Apr 5 2010 6:03PM BULAGA Apr 5 2010 6:03PM Apr 5 2010 7:01PM BULAGA

Below is the Perl script I wrote. He took care of the headers, hyphens, NULL elements, but I was not able to remove the lines following NULL in order to get the above output.

 #!/usr/bin/perl use strict; use warnings; $^I=".bak" #Backup the file before messing it up. open (DATAIN,"<george_perl.txt")|| die("can't open datafile: $!"); # Read in the data open (DATAOUT,">gen_results.txt")|| die("can't open datafile: $!"); #Prepare for the writing while (<DATAIN>) { s/Event_time//g; s/Cease_time//g; s/Object_of_reference//g; s/\-//g; #Preceding 4 statements are for cleaning out the headers my $theline=$_; if ($theline =~ /NULL/){ next; next if $theline =~ /SubN/; } else{ print DATAOUT $theline; } } close DATAIN; close DATAOUT;

Please help indicate any changes I need to make to the script to make it necessary.

0

regex perl

user322502 Apr 21 '10 at 18:13

source share

3 answers

Sounds like a good candidate for a small input delimiter ( $/ ). The idea is to manipulate it so that it processes one record at a time, and not by default by default.

 use strict; use warnings; $^I = '.bak'; open my $dataIn, '<', 'george_perl.txt' or die "Can't open data file: $!"; open my $dataOut, '>', 'gen_results.txt' or die "Can't open output file: $!"; { local $/ = "\n\t"; # Records have leading tabs while ( my $record = <$dataIn> ) { # Skip header & records that contain 'NULL' next if $record =~ /NULL|Event_time/; # Strip out the unwanted yik-yak $record =~ s/SubNetwork.*BtsSiteMgr=//s; # Print record to output file print $dataOut $record; } } close $dataIn; close $dataOut;

Please note the following:

using the more secure three-parameter form of open (the form of two arguments is what you showed)
using scalar variables rather than simple words to define file descriptors
using the local keyword and extra curls to change the definition of $/ only if necessary.
the second s in s/SubNetwork.*BtsSitMgr=//s allows multiple lines to match.

+1

Zaid Apr 21 '10 at 19:41

source share

 s/^.*NULL\r?\n.*\r?\n.*\r?\n//mg;

filter out lines ending in NULL , plus the next two lines.

0

Tim Pietzcker Apr 21 '10 at 19:40

source share

FMc · Accepted Answer · 2010-04-21 20:46

Your data comes in a set of 3 lines, so one approach is to organize the parsing in this way:

 use strict; use warnings; # Ignore header junk. while (<>){ last unless /\S/; } until (eof) { # Read in a set of 3 lines. my @lines; push @lines, scalar <> for 1 .. 3; # Filter and clean. next if $lines[0] =~ /\sNULL\s/; $lines[2] =~ s/.+BtsSiteMgr=//; print @lines[0,2]; }

Perl: delete multiple repeat lines where a specific criterion is met

More articles: