Fast integer conversion in Python

Question

Fast integer conversion in Python

A simple problem: actually: you have one billion (1e + 9) unsigned 32-bit integers stored as ASCII decimal strings in a TSV file (with partition delimiters). Conversion using is int()terribly slow compared to other tools working with the same dataset. What for? And more importantly: how to do it faster?

So the question is: what is the fastest way to convert a string to an integer in Python?

What I'm really thinking about is some hidden Python functions that can (ab) be used for this purpose, as opposed to using Guido array.arrayin its "Optimization Anecdote" .

Sample data (with tab extensions to spaces)

38262904        "pfv"              2002-11-15T00:37:20+00:00
12311231        "tnealzref"        2008-01-21T20:46:51+00:00
26783384        "hayb"             2004-02-14T20:43:45+00:00
812874          "qevzasdfvnp"      2005-01-11T00:29:46+00:00
22312733        "bdumtddyasb"      2009-01-17T20:41:04+00:00

The time required to read the data does not matter here; data processing is a bottleneck.

Microbenchmarks

All of the following interpreted languages. The host computer runs on a 64-bit version of Linux.

Python 2.6.2 with IPython 0.9.1, ~ 214k conversions per second (100%):

In [1]: strings = map(str, range(int(1e7)))

In [2]: %timeit map(int, strings);
10 loops, best of 3: 4.68 s per loop

REBOL 3.0 Version 2.100.76.4.2, ~ 231kcps (108%):

>> strings: array n: to-integer 1e7 repeat i n [poke strings i mold (i - 1)]
== "9999999"

>> delta-time [map str strings [to integer! str]]
== 0:00:04.328675

REBOL 2.7.6.4.2 (March 15, 2008), ~ 523kcps (261%):

As John noted in the comments, this version does not create a list of converted integers, so the given ratio of speed relative to Python 4.99s runtime is for str in strings: int(str).

>> delta-time: func [c /local t] [t: now/time/precise do c now/time/precise - t]

>> strings: array n: to-integer 1e7 repeat i n [poke strings i mold (i - 1)]
== "9999999"

>> delta-time [foreach str strings [to integer! str]]
== 0:00:01.913193

KDB + 2.6t 2009.04.15, ~ 2016kcps (944%):

q)strings:string til "i"$1e7

q)\t "I"$strings
496

+3

performance optimization python

earl Aug 20 '09 at 10:11

source share

6 answers

, Python . C Python.

+4

Greg Hewgill 20 . '09 22:15

, , "" . int , , .

. , , . . .

Python, . . , . ,

values = map(int, open("numberfile.txt"))

, itertools, Python. , , .

numfile = open("numberfile.txt")
valIter = itertools.imap(int, itertools.chain(itertools.imap(str.split, numfile)))

+3

Peter Shinners 21 . '09 5:53

; Python, , . Psyco , C/++.

+1

ramosg 20 . '09 22:30

, C, /. . , Pyrex Cython C Python ( Python).

Cython , .

, , ... ? , , ? threading multiprocessing ? ( /, , ). , / ?

+1

Jim Dennis 21 . '09 6:56

, , , . ? , .

0

Mike Dunlavey Aug 21 '09 at 0:46

source share

earl · Accepted Answer · 2009-08-21T09:25:53+0000

C , (650 / 214 /):

static PyObject *fastint_int(PyObject *self, PyObject *args) {
    char *s; unsigned r = 0;
    if (!PyArg_ParseTuple(args, "s", &s)) return NULL;
    for (r = 0; *s; r = r * 10 + *s++ - '0');
    return Py_BuildValue("i", r);
}

, , , .

Fast integer conversion in Python

More articles: