Duplicate data deletion using Perl invoked internally through a batch file on Windows A DOS window on Windows is invoked through a batch file. The batch file calls the Perl script, which performs the actions. I have a batch file. Script code I have duplicates of work since the data file is not too big. A problem that needs to be resolved is related to large data files (2 GB or more), while the file size causes a memory error when trying to load a complete file into an array to duplicate data deletion. A memory error occurs in a subroutine: -
@contents_of_the_file = <INFILE>;
(a completely different method is acceptable, if it solves this problem, please suggest). Subprogram: -
sub remove_duplicate_data_and_file
{
open(INFILE,"<" . $output_working_directory . $output_working_filename) or dienice ("Can't open $output_working_filename : INFILE :$!");
if ($test ne "YES")
{
flock(INFILE,1);
}
@contents_of_the_file = <INFILE>;
if ($test ne "YES")
{
flock(INFILE,8);
}
close (INFILE);
@unique_contents_of_the_file= grep(!$unique_contents_of_the_file{$_}++, @contents_of_the_file);
open(OUTFILE,">" . $output_restore_split_filename) or dienice ("Can't open $output_restore_split_filename : OUTFILE :$!");
if ($test ne "YES")
{
flock(OUTFILE,1);
}
for($element_number=0;$element_number<=$#unique_contents_of_the_file;$element_number++)
{
print OUTFILE "$unique_contents_of_the_file[$element_number]\n";
}
if ($test ne "YES")
{
flock(OUTFILE,8);
}
}