Printing a sequence from a fasta file

Question

Printing a sequence from a fasta file

I often need to find a specific sequence in a fasta file and print it. For those who don’t know, fasta is a text file format for biological sequences (DNA, proteins, etc.). It's pretty simple, you have a line with the name of the sequence preceded by ">", and then all the lines that follow until the next ">" are the sequence itself. For instance:

>sequence1
ACTGACTGACTGACTG
>sequence2
ACTGACTGACTGACTG
ACTGACTGACTGACTG
>sequence3
ACTGACTGACTGACTG

The way I am now getting the sequence I need is to use grep with -A, so I will do

grep -A 10 sequence_name filename.fa

and then, if I don’t see the beginning of the next sequence in the file, I will change 10 to 20 and repeat until I’m sure that I get the whole sequence.

, . , ' > '?

+2

bash grep fasta

Colin 01 . '14 15:17

4

:

awk '/>sequence1/{p++;print;next} /^>/{p=0} p' file

, >sequence1, (p), , . , >, p, . , p.

, grep, , -A (after):

grep -A 999999 "sequence1" file | awk 'NR>1 && /^>/{exit} 1'

, 999999 sequence1 awk. Awk > 1 , . , 1 awk , .

+2

Mark Setchell 01 . '14 15:24

Use sedonly:

sed -n '/>sequence3/,/>/ p' | sed '${/>/d}'

+1

Vytenis bivainis 01 Oct '14 at 20:47

source share

$ perl -0076 -lane 'print join("\n",@F) if $F[0]=~/sequence2/' file

0

dawg 01 Oct '14 at 16:48

source share

glenn jackman · Accepted Answer · 2014-10-01T15:39:54+0000

> :

awk -v seq="sequence2" -v RS='>' '$1 == seq {print RS $0}' file

>sequence2
ACTGACTGACTGACTG
ACTGACTGACTGACTG

Printing a sequence from a fasta file

More articles: