Perl program to simulate RNA synthesis

Look for suggestions on how to approach my homework Perl programming programming to write an RNA synthesis program. I summed up and described the program below. In particular, I am looking for feedback on the blocks below (I will indicate for convenience). I read Chapter 6, "Perl Programming Elements," by Andrew Johnson (great book). I also read perlfunc and perlop pod-pages, without jumping anything from where to start.

Description of the program: the program should read the input file from the command line, translate it into RNA, and then rewrite RNA into a sequence of uppercase single-letter amino acid names.

  • Accept a file with a name on the command line

    here I will use the operator <>

  • Make sure the file contains only acgt or die

    if ( <> ne [acgt] ) { die "usage: file must only contain nucleotides \n"; }  
    
  • Transcribe DNA into RNA (each A is replaced by U, T is replaced by A, C is replaced by G, G is replaced by C)

    not sure how to do it

  • Take this transcription and divide it into 3 codons, starting with the first appearance of AUG

    not sure, but I think I will start with% hash variables?

  • Take the 3-digit "codons" and give them a single-letter character (single-word name in the form of an uppercase letter)

    Assign a key using a value (there are 70 possibilities here, so I'm not sure where to store it or how to handle it)

  • If a space occurs, a new line begins and the process repeats

    not sure, but we can assume that spaces are multiples of triples.

  • Am I approaching this correctly? Is there a Perl function that I skip that can simplify the main program?

Note

( ).

, , , , "AUG". , .

, , . , !

+3
3

1. here I will use the <> operator

, , . chomp , , .


2. Check to make sure the file only contains acgt or die

if ( <> ne [acgt] ) { die "usage: file must only contain nucleotides \n"; }

while <> $_, (my $line = <>).

. .

, ne , . !~ ( =~ , [^acgt]). , , i .


3. Transcribe the DNA to RNA (Every A replaced by U, T replaced by A, C replaced by G, G replaced by C).

GWW, . T- > U - . tr ().


4. Take this transcription & break it into 3 character 'codons' starting at the first occurance of "AUG"

not sure but I'm thinking this is where I will start a %hash variables?

. while(<>). index "AUG". , ( substr $line, -2, 2). ( .=) , "AUG". , , , .


5. Take the 3 character "codons" and give them a single letter Symbol (an uppercase one-letter amino acid name)

Assign a key a value using (there are 70 possibilities here so I'm not sure where to store or how to access)

, GWW, -:

%codons = ( AUG => 'M', ...).

(.) split , , -.


6.If a gap is encountered a new line is started and process is repeated

not sure but we can assume that gaps are multiples of threes.

. . exists $codons{$current_codon}.


7. Am I approaching this the right way? Is there a Perl function that I'm overlooking that can simplify the main program?

, , . ; read_codon translate: , .

, , , :

use warnings; use strict;
use feature 'state';


# read_codon works by using the new [state][1] feature in Perl 5.10
# both @buffer and $handle represent 'state' on this function:
# Both permits abstracting reading codons from processing the file
# line-by-line.
# Once read_colon is called for the first time, both are initialized.
# Since $handle is a state variable, the current file handle position
# is never reset. Similarly, @buffer always holds whatever was left
# from the previous call.
# The base case is that @buffer contains less than 3bp, in which case
# we need to read a new line, remove the "\n" character,
# split it and push the resulting list to the end of the @buffer.
# If we encounter EOF on the $handle, then we have exhausted the file,
# and the @buffer as well, so we 'return' undef.
# otherwise we pick the first 3bp of the @buffer, join them into a string,
# transcribe it and return it.

sub read_codon {
    my ($file) = @_;

    state @buffer;
    open state $handle, '<', $file or die $!;

    if (@buffer < 3) {
        my $new_line = scalar <$handle> or return;
        chomp $new_line;
        push @buffer, split //, $new_line;
    }

    return transcribe(
                       join '', 
                       shift @buffer,
                       shift @buffer,
                       shift @buffer
                     );
}

sub transcribe {
    my ($codon) = @_;
    $codon =~ tr/T/U/;
    return $codon;
}


# translate works by using the new [state][1] feature in Perl 5.10
# the $TRANSLATE state is initialized to 0
# as codons are passed to it, 
# the sub updates the state according to start and stop codons.
# Since $TRANSLATE is a state variable, it is only initialized once,
# (the first time the sub is called)
# If the current state is 'translating',
# then the sub returns the appropriate amino-acid from the %codes table, if any.
# Thus this provides a logical way to the caller of this sub to determine whether
# it should print an amino-acid or not: if not, the sub will return undef.
# %codes could also be a state variable, but since it is not actually a 'state',
# it is initialized once, in a code block visible form the sub,
# but separate from the rest of the program, since it is 'private' to the sub

{
    our %codes = (
        AUG => 'M',
        ...
    );

    sub translate {
        my ($codon) = @_ or return;

        state $TRANSLATE = 0;

        $TRANSLATE = 1 if $codon =~ m/AUG/i;
        $TRANSLATE = 0 if $codon =~ m/U(AA|GA|AG)/i;

        return $codes{$codon} if $TRANSLATE;
    }
}
+5

.

, , , , , , .

, , , , strands. , "" .

2. , .

3. if hash

4. . .

5. , .

6. . , , №2, , ATGC.

perl, . perl, bioperl. , .

+3

Take a look at BioPerl and look at the source modules for indicators on how to do this.

+1
source

Source: https://habr.com/ru/post/1773360/


All Articles