How to combine two sequences using arrays in perl

Question

How to combine two sequences using arrays in perl

When navigating through two arrays, I am confused about how to move the pointer through one loop, but keeping it constant in the other. For example:

Array 1: ATCGTCGAGCG
Array 2: ACGTCCTGTCG

So, A in the first array corresponds to A in the second array, so we move on to the next elements. But since T does not match C in the second index, I want the program to compare this T with the next G in array 2, and so on, until it finds the corresponding T.

 my ($array1ref, $array2ref) = @_; my @array1 = @$array1ref; my @array2= @$array2ref; my $count = 0; foreach my $element (@array1) { foreach my $element2 (@array2) { if ($element eq $element2) { $count++; }else { ??????????? }

+4

perl

user2344516 May 02, '13 at 19:35

source share

7 answers

Andomar · Answer 1 · 2013-05-02T20:02:43+0000

You can use the while to find matches. If you find a match, advance in both arrays. If you do not, advance the second array. At the end, you can print the remaining unsurpassed characters from the first array:

 # [1, 2, 3] is a reference to an anonymous array (1, 2, 3) # qw(1, 2, 3) is shorthand quoted-word for ('1', '2', '3') my $arr1 = [qw(ATCGTCGAGCG)]; my $arr2 = [qw(ACGTCCTGTCG)]; my $idx1 = 0; my $idx2 = 0; # Find matched characters # @$arr_ref is the size of the array referenced by $arr_ref while ($idx1 < @$arr1 && $idx2 < @$arr2) { my $char1 = $arr1->[$idx1]; my $char2 = $arr2->[$idx2]; if ($char1 eq $char2) { # Matched character, advance arr1 and arr2 printf("%s %s -- arr1[%d] matches arr2[%d]\n", $char1, $char2, $idx1, $idx2); ++$idx1; ++$idx2; } else { # Unmatched character, advance arr2 printf(". %s -- skipping arr2[%d]\n", $char2, $idx2); ++$idx2; } } # Remaining unmatched characters while ($idx1 < @$arr1) { my $char1 = $arr1->[$idx1]; printf("%s . -- arr1[%d] is beyond the end of arr2\n", $char1, $idx1); $idx1++; }

script prints:

 AA -- arr1[0] matches arr2[0] . C -- skipping arr2[1] . G -- skipping arr2[2] TT -- arr1[1] matches arr2[3] CC -- arr1[2] matches arr2[4] . C -- skipping arr2[5] . T -- skipping arr2[6] GG -- arr1[3] matches arr2[7] TT -- arr1[4] matches arr2[8] CC -- arr1[5] matches arr2[9] GG -- arr1[6] matches arr2[10] A . -- arr1[7] is beyond the end of arr2 G . -- arr1[8] is beyond the end of arr2 C . -- arr1[9] is beyond the end of arr2 G . -- arr1[10] is beyond the end of arr2

ikegami · Answer 2 · 2013-05-02T19:50:36+0000

Nested loops do not make sense. You do not want to iterate over more than once.

You did not specify what you would like to do after resynchronization, so you will want to start with the following and adapt it to your needs.

 my ($array1, $array2) = @_; my $idx1 = 0; my $idx2 = 0; while ($idx1 < @$array1 && $idx2 < @$array2) { if ($array1->[$idx1] eq $array2->[$idx2]) { ++$idx1; ++$idx2; } else { ++$idx2; } } ...

As in the case above, the fragment above will leave $idx1 in the last index, which it could not (eventually) repeat. If instead you want to stop, as soon as you first re-sync, you want

 my ($array1, $array2) = @_; my $idx1 = 0; my $idx2 = 0; my $mismatch = 0; while ($idx1 < @$array1 && $idx2 < @$array2) { if ($array1->[$idx1] eq $array2->[$idx2]) { last if $mismatched; ++$idx1; ++$idx2; } else { ++$mismatched; ++$idx2; } } ...

amon · Answer 3 · 2013-05-02T19:51:08+0000

Foreach loops will not cut: we will either want to go in cycles while there are available elements in both arrays, or iterate over all indexes, which we can increase as needed:

 EL1: while (defined(my $el1 = shift @array1) and @array2) { EL2: while(defined(my $el2 = shift @array2)) { ++$count and next EL1 if $el1 eq $el2; # break out of inner loop } }

or

 my $j = 0; # index of @array2 for (my $i = 0; $i <= $#array1; $i++) { $j++ until $j > $#array or $array1[$i] eq $array2[$j]; last if $j > $#array; $count++; }

or any combination.

user1937198 · Answer 4 · 2013-05-02T19:51:36+0000

This is a difficult condition for using loops when using loops instead

 my ($array1ref, $array2ref) = @_; my @array1 = @$array1ref; my @array2= @$array2ref; my $count = 0; my ($index, $index2) = (0,0); #loop while indexs are in arrays while($index <= @#array1 && $index2 <= @#array2) { if($array1[$index] eq $array2[$index2]) { $index++; $index2++; } else { #increment index until we find a match $index2++ until $array1[$index] eq $array2[$index2]; } }

Björn labitzke · Answer 5 · 2013-05-02T19:54:56+0000

Here is one opportunity. It will use indexes to traverse both lists.

 my @array1 = qw(ATCGTCGAGCG); my @array2 = qw(ACGTCCTGTCG); my $count = 0; my $idx1 = 0; my $idx2 = 0; while(($idx1 < scalar @array1) && ($idx2 < scalar @array2)) { if($array1[$idx1] eq $array2[$idx2]) { print "Match of $array1[$idx1] array1 \@ $idx1 and array2 \@ $idx2\n"; $idx1++; $idx2++; $count++; } else { $idx2++; } } print "Count = $count\n";

Joe z · Answer 6 · 2013-05-06T09:45:39+0000

It seems you could do this quite easily with grep if you are guaranteed that array2 will always be longer or larger than array1. Something like that:

 sub align { my ($array1, $array2) = @_; my $index = 0; return grep { $array1->[$index] eq $array2->[$_] ? ++$index : 0 } 0 .. scalar( @$array2 ) - 1; }

Basically, grep says: "Return me a list of incrementing indices in array2, which correspond to adjacent elements from the array."

If you run the above using this test code, you will see that it returns the expected alignment array:

 my @array1 = qw(ATCGTCGAGCG); my @array2 = qw(ACGTCCTGTCG); say join ",", align \@array1, \@array2;

This displays the expected display: 0.3,4,7,8,9,10. This list means that @array1[0 .. 6] matches @array2[0,3,4,7,8,9,10] .

(Note: you need to use Modern::Perl or similar to use say .)

Now you really haven't said what you need to output the operation. I suggested that you need this array of mappings. If you just need to count the number of elements skipped in @array2 when aligning with @array1 , you can still use the grep above, but instead of a list, just return scalar(@$array2) - $index at the end.

Ken williams · Answer 7 · 2013-05-09T22:26:17+0000

As you know, your problem is called alignment sequence . There are well-developed algorithms for this, and one such Algorithm :: NeedlemanWunsch module is available in CPAN. Here's how you can apply it to your problem.

 #!/usr/bin/perl use Algorithm::NeedlemanWunsch; my $arr1 = [qw(ATCGTCGAGCG)]; my $arr2 = [qw(ACGTCCTGTCG)]; my $matcher = Algorithm::NeedlemanWunsch->new(sub {@_==0 ? -1 : $_[0] eq $_[1] ? 1 : -2}); my (@align1, @align2); my $result = $matcher->align($arr1, $arr2, { align => sub {unshift @align1, $arr1->[shift]; unshift @align2, $arr2->[shift]}, shift_a => sub {unshift @align1, $arr1->[shift]; unshift @align2, '.'}, shift_b => sub {unshift @align1, '.'; unshift @align2, $arr1->[shift]}, }); print join("", @align1), "\n"; print join("", @align2), "\n";

This displays the optimal solution in terms of costs specified in the constructor:

 ATCGT.C.GAGCG A.CGTTCGG.TCG

A completely different method from the one that was in your original question, but I think it is worth knowing.

How to combine two sequences using arrays in perl

More articles: