Faster way to merge multiple files

Question

Faster way to merge multiple files

I have several small files on Linux (about 70,000 files), and I want to add a word at the end of each line of files, and then combine them into one file.

I am using this script:

for fn in *.sms.txt do sed 's/$/'$fn'/' $fn >> sms.txt rm -f $fn done

Is there a faster way to do this?

+4

linux file bash

user1815910 Nov 11 '12 at 10:53

source share

4 answers

gniourf_gniourf · Answer 1 · 2012-11-11T11:46:15+0000

I tried with these files:

 for ((i=1;i<70000;++i)); do printf -v fn 'file%.5d.sms.txt' $i; echo -e "HAHA\nLOL\nBye" > "$fn"; done

I tried your solution, which took about 4 minutes (real) to process. The problem with your solution is that you open sed 70,000 times! And the turn is pretty slow.

 #!/bin/bash filename="sms.txt" # Create file "$filename" or empty it if it already existed > "$filename" # Start editing with ed, the standard text editor ed -s "$filename" < <( # Go into insert mode: echo i # Loop through files for fn in *.sms.txt; do # Loop through lines of file "$fn" while read l; do # Insert line "$l" with "$fn" appended to echo "$l$fn" done < "$fn" done # Tell ed to quit insert mode (.), to save (w) and quit (q) echo -e ".\nwq" )

This solution took approx. 6 seconds

Don’t forget that ed is a standard text editor, and don’t forget about it! If you liked ed , you will also like ex !

Hooray!

Guru · Answer 2 · 2012-11-11T14:13:25+0000

Almost the same as gniourf_gniourf solution, but without ed:

 for i in *.sms.txt do while read line do echo $line $i done < $i done >sms.txt

Ah · Answer 3 · 2012-11-11T12:11:24+0000

This perl script adds the actual file name at the end of each line.

 #!/usr/bin/perl use strict; while(<>){ chomp; print $_, $ARGV, "\n"; }

Name it as follows:

 scriptname *.sms.txt > sms.txt

Since there is only one process, and regular expression processing is not required, it should be fairly fast.

doubledown · Answer 4 · 2012-11-11T18:49:29+0000

What, no love for awk ?

 awk '{print $0" "FILENAME}' *.sms.txt >sms.txt

Using gawk , it took 1-2 seconds on the gniourf_gniourf pattern on my machine (according to time ).

mawk about 0.2 seconds faster than gawk here.

Faster way to merge multiple files

More articles: