What is the fastest way to "print" a file in perl?

Question

What is the fastest way to "print" a file in perl?

I have been writing output from perl scripts to files for some time using the following code:

open( OUTPUT, ">:utf8", $output_file ) or die "Can't write new file: $!"; print OUTPUT "First line I want printed\n"; print OUTPUT "Another line I want printing\n"; close(OUTPUT);

It works faster than my original approach, which used "say" instead of printing (thanks NYTProf for enlightening me!)

However, my current script iterates through hundreds of thousands of lines and takes many hours to run this method, and NYTProf points my finger at my thousands of print commands. So the question is ... Is there a faster way to do this?

Other information that may be relevant ... Perl Version: 5.14.2 (On Ubuntu)

Background script ... Row '|' delimited flat files are read in hashes; each file has some primary keys that map records from one to another. I manipulate this data and they combine them into one file for import into another system.

The output file is about 3 million lines, and the program starts to slow down noticeably after writing about 30,000 lines to the specified file. (A little reading seemed to indicate that you had exhausted the write buffer in other languages, but I could not find anything about this regarding perl?)

EDIT: Now I tried adding the line below, right after the open () statement, to disable print spooling, but the program still slows down around the 30,000th line.

 OUTPUT->autoflush(1);

+4

perl buffer

Ashimema Mar 10 '12 at 20:34

source share

2 answers

Did you try to compose the entire single print into one scalar, and then immediately scan the scalar? I have a script that outputs an average of 20 lines of text for each line of input. When using separate print statements, even sending the output to / dev / null was time consuming. But when I put all the output (for one line of input) together using things like:

$output .= "...";

$output .= sprintf("%s...", $var);

Then, before leaving the line processing routine, I will print $ output. Print all lines at once. The number of print calls went from ~ 7.7M to about 386K - equal to the number of lines in the input date file. This shaved 10% of my total lead time.

+3

Jimb Mar 01 '13 at 0:27

source share

Borodin · Accepted Answer · 2012-03-10T21:11:57+0000

I think you need to reverse engineer the algorithm that your program uses. The output speed of the files does not depend on the amount of data that was output, and it is much more likely that your program reads and processes the data, but does not release it.

Check the amount of memory used by your process to see if it is growing steadily.
Beware of for (<$filehandle>) loops for (<$filehandle>) that immediately read entire files into memory
As I said in my comment, turn off the corresponding print statements to see how performance changes

What is the fastest way to "print" a file in perl?

More articles: