I modified my code to create a Hash as suggested. I did not include the output in the binary, but due to time limitations. In addition, I need to figure out how to reference the hash in order to get the data and pack it into binary files. I donโt think this part should be complicated ... I hope
In the actual data file (~ 350 MB and 2.0 million lines), the following code takes about 3 minutes to build a hash. The CPU usage was 100% on 1 of my cores (nill on the other 3), and the Perl memory usage exceeded about 325 MB ... until it dumped millions of lines into the tooltip. However, the Dump print will be replaced with a binary package.
Please let me know if I am making a rookie mistake.
#!/usr/bin/perl use strict; use warnings; use Data::Dumper; my $lineArg1 = $ARGV[0]; open(INFILE, $lineArg1); my $line; my @param_names; my @template; while ($line = <INFILE>) { chomp $line; #Remove New Line if ($line =~ s/\s+filter = ALL_VALUES//) { #Find parameters and build a list push @param_names, trim($line); } elsif ($line =~ /^----/) { @template = map {'A'.length} $line =~ /(\S+\s*)/g; #Make template for unpack $template[-1] = 'A*'; my $data_start_pos = tell INFILE; last; #Reached start of data exit loop } } my $size = $#param_names+1; my @getType = ((1) x $size); my $template = "@template"; my @lineData; my %dataHash; my $lineCount = 0; while ($line = <INFILE>) { if ($lineCount % 100000 == 0){ print "On Line: ".$lineCount."\n"; } if ($line =~ /^\d/) { chomp($line); @lineData = unpack $template, $line; my ($inHeader, $headerIndex) = findStr($lineData[1], @param_names); if ($inHeader) { push @{$dataHash{$lineData[1]}{time} }, $lineData[0]; push @{$dataHash{$lineData[1]}{data} }, $lineData[3]; if ($getType[$headerIndex]){ # Things that only need written once $dataHash{$lineData[1]}{type} = $lineData[2]; $getType[$headerIndex] = 0; } } } $lineCount ++; } # END WHILE <INFILE> close(INFILE); print Dumper \%dataHash; #WRITE BINARY FILE and TOC FILE my %convert = (TXT=>sub{pack 'A*', join "\n", @_}, D=>sub{pack 'd*', @_}, UI=>sub{pack 'L*', @_}); open my $binfile, '>:raw', $lineArg1.'.bin'; open my $tocfile, '>', $lineArg1.'.toc'; for my $param (@param_names){ my $data = $dataHash{$param}; my @toc_line = ($param, $data->{type}, tell $binfile ); print {$binfile} $convert{D}->(@{$data->{time}}); push @toc_line, tell $binfile; print {$binfile} $convert{$data->{type}}->(@{$data->{data}}); push @toc_line, tell $binfile; print {$tocfile} join(',',@toc_line,''),"\n"; } sub trim { #Trim leading and trailing white space my (@strings) = @_; foreach my $string (@strings) { $string =~ s/^\s+//; $string =~ s/\s+$//; chomp ($string); } return wantarray ? @strings : $strings[0]; } # END SUB sub findStr { #Return TRUE if string is contained in array. my $searchStr = shift; my $i = 0; foreach ( @_ ) { if ($_ eq $searchStr){ return (1,$i); } $i ++; } return (0,-1); } # END SUB
The output is as follows:
$VAR1 = { 'Param 1' => { 'time' => [ '1.1', '3.2', '5.3' ], 'type' => 'UI', 'data' => [ '5', '10', '15' ] }, 'Param 2' => { 'time' => [ '4.5', '6.121' ], 'type' => 'D', 'data' => [ '2.1234', '3.1234' ] }, 'Param 3' => { 'time' => [ '2.23', '7.56' ], 'type' => 'TXT', 'data' => [ 'Some Text 1', 'Some Text 2' ] } };
Here is the TOC output file:
Param 1,UI,0,24,36, Param 2,D,36,52,68, Param 3,TXT,68,84,107,
Thank you all for your help! This is a great resource!
EDIT: Added code to write Binary and TOC files.