Insert header into file

I would like to hear your instructions on how to insert header lines (all lines in a file) into another file (larger, several GB). I prefer to use Unix / awk / sed to do this work.

# header I need to insert to another, they are in a file named "header". ##fileformat=VCFv4.0 ##fileDate=20090805 ##source=myImputationProgramV3.1 ##reference=1000GenomesPilot-NCBI36 ##phasing=partial ##INFO=<ID=NS,Number=1,Type=Integer,Description="Number of Samples With Data"> ##INFO=<ID=DP,Number=1,Type=Integer,Description="Total Depth"> ##INFO=<ID=AF,Number=.,Type=Float,Description="Allele Frequency"> ##INFO=<ID=AA,Number=1,Type=String,Description="Ancestral Allele"> ##INFO=<ID=DB,Number=0,Type=Flag,Description="dbSNP membership, build 129"> ##INFO=<ID=H2,Number=0,Type=Flag,Description="HapMap2 membership"> ##FILTER=<ID=q10,Description="Quality below 10"> ##FILTER=<ID=s50,Description="Less than 50% of samples have data"> ##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype"> ##FORMAT=<ID=GQ,Number=1,Type=Integer,Description="Genotype Quality"> ##FORMAT=<ID=DP,Number=1,Type=Integer,Description="Read Depth"> ##FORMAT=<ID=HQ,Number=2,Type=Integer,Description="Haplotype Quality"> #CHROM POS ID REF ALT QUAL FILTER INFO 
+6
source share
2 answers
 header="/name/of/file/containing/header" for file in " $@ " do cat "$header" "$file" > /tmp/xx.$$ mv /tmp/xx.$$ "$file" done 

You may want to find a temporary file in the same file system as the file you are editing, but anything that requires inserting data at the beginning of the file will work very close to this. If you are going to do this all day, every day, you can collect something a little spot, but the chances of saving will be negligible (fractions of a second per file).

If you really should use sed , then I suppose you could use:

 header="/name/of/file/containing/header" for file in " $@ " do sed -e "0r $header" "$file" > /tmp/xx.$$ mv /tmp/xx.$$ "$file" done 

The command reads the contents of the header β€œafter” line 0 (before line 1), and then everything else is passed unchanged. It is not as fast as cat .

A similar construction using awk :

 header="/name/of/file/containing/header" for file in " $@ " do awk '{print}' "$header" "$file" > /tmp/xx.$$ mv /tmp/xx.$$ "$file" done 

It simply prints each line of input in the output; again, not as fast as cat .

Another advantage of cat over sed or awk ; cat will work even if large files are mostly binary data (this does not pay attention to the contents of the files). Both sed and awk are designed to process line-split data; while modern versions are likely to handle even binary data well enough, that’s not what they are for.

+12
source

I did all this with a Perl script because I had to traverse the directory tree and handle different types of files differently. The main script was

 #!perl -w process_directory("."); sub process_directory { my $dir = shift; opendir DIR, $dir or die "$dir: not a directory\n"; my @files = readdir DIR; closedir DIR; foreach(@files) { next if(/^\./ or /bin/ or /obj/); # ignore some directories if(-d "$dir/$_") { process_directory("$dir/$_"); } else { fix_file("$dir/$_"); } } } sub fix_file { my $file = shift; open SRC, $file or die "Can't open $file\n"; my $file = "$file-f"; open FIX, ">$fix" or die "Can't open $fix\n"; print FIX <<EOT; -- Text to insert EOT while(<SRC>) { print FIX; } close SRC; close FIX; my $oldfile = $file; $oldFile =~ s/(.*)\.\(\w+)$/$1-old.$2/; if(rename $file, $oldFile) { rename $fix, $file; } } 

Share and enjoy! Or not - I'm not a Perl hacker, so this is probably double plus-non-optimal Perl code. However, it worked for me!

+1
source

Source: https://habr.com/ru/post/887940/


All Articles