Faster way to merge multiple files

I have several small files on Linux (about 70,000 files), and I want to add a word at the end of each line of files, and then combine them into one file.

I am using this script:

for fn in *.sms.txt do sed 's/$/'$fn'/' $fn >> sms.txt rm -f $fn done 

Is there a faster way to do this?

+4
source share
4 answers

I tried with these files:

 for ((i=1;i<70000;++i)); do printf -v fn 'file%.5d.sms.txt' $i; echo -e "HAHA\nLOL\nBye" > "$fn"; done 

I tried your solution, which took about 4 minutes (real) to process. The problem with your solution is that you open sed 70,000 times! And the turn is pretty slow.

 #!/bin/bash filename="sms.txt" # Create file "$filename" or empty it if it already existed > "$filename" # Start editing with ed, the standard text editor ed -s "$filename" < <( # Go into insert mode: echo i # Loop through files for fn in *.sms.txt; do # Loop through lines of file "$fn" while read l; do # Insert line "$l" with "$fn" appended to echo "$l$fn" done < "$fn" done # Tell ed to quit insert mode (.), to save (w) and quit (q) echo -e ".\nwq" ) 

This solution took approx. 6 seconds

Don’t forget that ed is a standard text editor, and don’t forget about it! If you liked ed , you will also like ex !

Hooray!

+6
source

Almost the same as gniourf_gniourf solution, but without ed:

 for i in *.sms.txt do while read line do echo $line $i done < $i done >sms.txt 
+2
source

This perl script adds the actual file name at the end of each line.

 #!/usr/bin/perl use strict; while(<>){ chomp; print $_, $ARGV, "\n"; } 

Name it as follows:

 scriptname *.sms.txt > sms.txt 

Since there is only one process, and regular expression processing is not required, it should be fairly fast.

+1
source

What, no love for awk ?

 awk '{print $0" "FILENAME}' *.sms.txt >sms.txt 

Using gawk , it took 1-2 seconds on the gniourf_gniourf pattern on my machine (according to time ).

mawk about 0.2 seconds faster than gawk here.

+1
source

Source: https://habr.com/ru/post/1445312/


All Articles