Common words in Python

How can I write code to find the most frequent 2-dimensional "GATCCAGATCCCCATAC". I wrote this code, but it seems like I'm wrong, help me with the fix.

def PatternCount(Pattern, Text):
    count = 0
    for i in range(len(Text)-len(Pattern)+1):
        if Text[i:i+len(Pattern)] == Pattern:
            count = count+1
    return count

This code prints the most commonly used k-mer in a line, but it does not give me 2-mer in a given line.

+4
source share
3 answers

You can first define a function to get all k-mer in your line:

def get_all_k_mer(string, k=1):
   length = len(string)
   return [string[i: i+ k] for i in xrange(length-k+1)]

Then you can use collections.Counterto count the repetition of each k-mer:

>>> from collections import Counter
>>> s = 'GATCCAGATCCCCATAC'
>>> Counter(get_all_k_mer(s, k=2))

Conclusion:

Counter({'AC': 1,
         'AG': 1,
         'AT': 3,
         'CA': 2,
         'CC': 4,
         'GA': 2,
         'TA': 1,
         'TC': 2})

Another example:

>>> s = "AAAAAA"
>>> Counter(get_all_k_mer(s, k=3))

Output:

Counter({'AAA': 4})
# Indeed : AAAAAA
           ^^^     -> 1st time
            ^^^    -> 2nd time
             ^^^   -> 3rd time
               ^^^ -> 4th time
+5
source

In general, when I want to count things using python, I use Counter

from itertools import tee
from collections import Counter

dna = "GATCCAGATCCCCATAC"
a, b = tee(iter(dna), 2)
_ = next(b)
c = Counter(''.join(l) for l in zip(a,b))
print(c.most_common(1))

[('CC', 4)], 1 2- .

n- n.

from itertools import tee, islice
from collections import Counter

def nmer(dna, n):
    iters = tee(iter(dna), n)
    iters = [islice(it, i, None) for i, it in enumerate(iters)]
    c = Counter(''.join(l) for l in zip(*iters))
    return c.most_common(1)
+2

, . more_itertools, . , pip install more_itertools.

>>> from collections import Counter
>>> import more_itertools

>>> s = "GATCCAGATCCCCATAC"
>>> Counter(more_itertools.windowed(s, 2))
Counter({('A', 'C'): 1,
         ('A', 'G'): 1,
         ('A', 'T'): 3,
         ('C', 'A'): 2,
         ('C', 'C'): 4,
         ('G', 'A'): 2,
         ('T', 'A'): 1,
         ('T', 'C'): 2})

, windowed Counter.

"" k=2 (, step=1). Counter. . Counter .

, . , k-:

>>> from collections import Counter
>>> import more_itertools

>>> def count_mers(seq, k=1):
...     """Return a counter of adjacent mers."""
...     return Counter(("".join(mers) for mers in more_itertools.windowed(seq, k)))

>>> s = "GATCCAGATCCCCATAC"
>>> count_mers(s, k=2)
Counter({'AC': 1,
         'AG': 1,
         'AT': 3,
         'CA': 2,
         'CC': 4,
         'GA': 2,
         'TA': 1,
         'TC': 2})
+2

Source: https://habr.com/ru/post/1663762/


All Articles