Finding an item in a list with 2MILLION items - Python

Question

Finding an item in a list with 2MILLION items - Python

I have a list of strings of 1.9-2 MILLION elements .

The following code:

items = [...]
item_in_list = items[-1] in items

takes 0.1 seconds

With sqlite3 it takes 0.7 seconds

Now the problem is that I need to perform this check 1 million times , and I would like to complete it in minutes, not days.

More precisely, I am trying to synchronize the contents of a CSV file with the calculated values in the database.

Any ideas? It would be great:)

+3

python sql django sqlite search

Radianthex Dec 15 '10 at 17:41

source share

4 answers

, , . SORTED. , .

+1

a sandwhich 15 . '10 17:44

, .

, :

2 .
, .
, .

Update:

As mentioned in the comments, sets and dicts do not use binary trees, they use hash tables. This should be faster than a list, and in fact, probably even faster than a binary search.

0

Lennart Regebro Dec 15 '10 at 17:49

source share

from my head, with so little information about why you do this several million times:

1.) can you import csv into a table and perform validations in sql?

2.) what about sorting and indexing the list for quick access?

amuses, P

0

Kutscherait Dec 15 '10 at 18:03

source share

marr75 · Accepted Answer · 2010-12-15T17:47:55+0000

.

:

import random
from timeit import Timer

def random_strings(size):
    alpha = 'abcdefghijklmnopqrstuvwxyz'
    min = 3
    max = 8
    strings = []
    for count in xrange(1, size):
        current = ''
        for x in random.sample(alpha, random.randint(min,max)):
            current += x  
        strings.append(current)
    return strings

string_list_1 = random_strings(10000)
string_list_2 = random_strings(10000)

def string_test():
    common = filter(lambda x: x in string_list_2, string_list_1)
    return common

def set_test():
    string_set_1 = frozenset(string_list_1)
    string_set_2 = frozenset(string_list_2)
    common = string_set_1 & string_set_2
    return common

string_timer = Timer("__main__.string_test()", "import __main__")
set_timer = Timer("__main__.set_test()", "import __main__")
print string_timer.timeit(10)
# 22.6108954005
print set_timer.timeit(10)
#  0.0226439453

, . , .

, , . , , , , .

Finding an item in a list with 2MILLION items - Python

More articles: