I often need to find a specific sequence in a fasta file and print it. For those who don’t know, fasta is a text file format for biological sequences (DNA, proteins, etc.). It's pretty simple, you have a line with the name of the sequence preceded by ">", and then all the lines that follow until the next ">" are the sequence itself. For instance:
>sequence1
ACTGACTGACTGACTG
>sequence2
ACTGACTGACTGACTG
ACTGACTGACTGACTG
>sequence3
ACTGACTGACTGACTG
The way I am now getting the sequence I need is to use grep with -A, so I will do
grep -A 10 sequence_name filename.fa
and then, if I don’t see the beginning of the next sequence in the file, I will change 10 to 20 and repeat until I’m sure that I get the whole sequence.
, . , ' > '?