Printing a sequence from a fasta file

I often need to find a specific sequence in a fasta file and print it. For those who don’t know, fasta is a text file format for biological sequences (DNA, proteins, etc.). It's pretty simple, you have a line with the name of the sequence preceded by ">", and then all the lines that follow until the next ">" are the sequence itself. For instance:

>sequence1
ACTGACTGACTGACTG
>sequence2
ACTGACTGACTGACTG
ACTGACTGACTGACTG
>sequence3
ACTGACTGACTGACTG

The way I am now getting the sequence I need is to use grep with -A, so I will do

grep -A 10 sequence_name filename.fa

and then, if I don’t see the beginning of the next sequence in the file, I will change 10 to 20 and repeat until I’m sure that I get the whole sequence.

, . , ' > '?

+2
4

> :

awk -v seq="sequence2" -v RS='>' '$1 == seq {print RS $0}' file
>sequence2
ACTGACTGACTGACTG
ACTGACTGACTGACTG
+5

:

awk '/>sequence1/{p++;print;next} /^>/{p=0} p' file

, >sequence1, (p), , . , >, p, . , p.

, grep, , -A (after):

grep -A 999999 "sequence1" file | awk 'NR>1 && /^>/{exit} 1'

, 999999 sequence1 awk. Awk > 1 , . , 1 awk , .

+2

Use sedonly:

sed -n '/>sequence3/,/>/ p' | sed '${/>/d}'
+1
source
$ perl -0076 -lane 'print join("\n",@F) if $F[0]=~/sequence2/' file
0
source

Source: https://habr.com/ru/post/1695975/


All Articles