Python performance: why is counter (r) NOT 100 times faster than {c: r.count (c) for c in set (r)}?

Let be ra string, we want to count the number of characters in r. If we reason quickly:

Counter(r)

about 100 times faster than

{c:r.count(c) for c in set(r)}

Indeed: in the plain text there are about 100 different characters (cap / uncap / punctuation / numbers ...), so it .countwill be executed 100 times over the entire line rinstead Counter, which will run only once.

However, the terms are not consistent with the above reasoning ( r- this is the content of all the books "The Lord of the Ring"):

In [71]: %timeit d = collections.Counter(r)
10 loops, best of 3: 98.8 ms per loop

In [72]: %timeit d = {c:r.count(c) for c in set(r)}
10 loops, best of 3: 114 ms per loop

In [73]: len(r)
Out[73]: 972550

Even if we increase the size of the row, the ratio will be the same

In [74]: r = r*100

In [79]: %time d = collections.Counter(r)
CPU times: user 9.9 s, sys: 12 ms, total: 9.91 s
Wall time: 9.93 s

In [81]: %time d = {c:r.count(c) for c in set(r)}
CPU times: user 11.5 s, sys: 0 ns, total: 11.5 s
Wall time: 11.6 s

.count, / ( ): https://hg.python.org/cpython-fullhistory/file/tip/Objects/stringlib/fastsearch.h. ?

EDIT: : Python 3.4.3 ( , 26 2015, 22:07:01) Lubuntu 15.04.

+4
1

Counter Python, string.count() C. 100- Python C. Counter :

def update(*args, **kwds):
    ...
    for elem in iterable:
        self[elem] = self_get(elem, 0) + 1

, Counter , Python (self.__setitem__ self.get) , , .

string.count , stringlib_count, fastsearch:

for (i = 0; i < n; i++) {
    if (s[i] == p[0]) {
        count++;
        if (count == maxcount)
            return maxcount;
    }
}
return count;

, , .

+3

Source: https://habr.com/ru/post/1628396/


All Articles