Sed converts a multi-line block to a single line (for example: fasta to phylip format)

Question

Sed converts a multi-line block to a single line (for example: fasta to phylip format)

In short :

how to convert from fasta to a "phylip" -like format (without the sequence and residues at the top of the file) using sed?

The fasta format is as follows:

>sequence1
AATCG
GG-AT
>sequence2
AGTCG
GGGAT

The number of lines in a sequence may vary.

I want to convert it to this:

sequence1 AATCG GG-AT
sequence2 AGTCG GGGAT

My question seems simple, but I lack a real understanding of extended commands in sedmultiline commands and commands using a hold buffer.

Here is the implementation idea I had: fill the template space with a sequence and only print it when a new sequence label is encountered. For this, I would:

, ^>. :
^> :

manual, , :

P p: ( )? " ".
?
?

python, perl awk, , " " , sed.

, :

script , . , , , :

#!/bin/sed -nf
1h
2,3H
4{x; s/\n/ /g; p}
5H
6{H;x; s/\n/ /g; p}

sed -nf fa2phy.sed my.fasta .

+1

sed fasta

PlasmaBinturong 24 . '17 10:40

3

ctac_ · Answer 1 · 2017-10-24T12:38:07+0000

sed

sed '/>/N;:A;/\n>/!{s/\n/ /;N;bA};h;s/\(.*\)\n.*/\1/p;x;s/.*\n//;bA' infile

RavinderSingh13 · Answer 2 · 2017-10-24T10:47:44+0000

awk .

1st:

awk '/^>/{sub(/>/,"");if(val){print val, val2};val=$0;val2="";next} {val2=val2?val2 FS $0:$0} END{print val, val2}'  Input_file

2nd:

awk -v RS=">" -v FS="\n" '{for(i=1;i<=NF;i++){printf("%s%s",$i,i==NF?"\n":" ")}}'   Input_file

3rd:

awk -v RS=">" '{gsub(/\n/," ");} NF'   Input_file

PlasmaBinturong · Answer 3 · 2017-10-24T11:56:54+0000

, , .

script : fa2phy.sed:

#!/bin/sed -nf

:readseq
${H;b out}              # if last line, append to hold, and goto 'out'
1{h;n;b readseq}        # if first, overwrite hold, and start again at 'readseq'
/^>/!{H; n; b readseq}  # if not a sequence label, append to hold, read next line, start again at 'readseq'. Else, it continues to 'out'

:out
x;         # exchange hold content with pattern content
s/^>//;    # substitute the starting '>'
s/\n/  /g; # substitute each newline with 2 spaces
p;         # print pattern buffer

, - , !:)

Sed converts a multi-line block to a single line (for example: fasta to phylip format)

More articles: