Save memory in Python. How to iterate over lines and save them with a 2 millionth file?

Question

Save memory in Python. How to iterate over lines and save them with a 2 millionth file?

I have a tab delimited data file with just over 2 million rows and 19 columns. You can find it at US.zip: http://download.geonames.org/export/dump/ .

I started running the following, but with for l in f.readlines(). I understand that only repeating over a file should be more efficient, so I am posting it below. However, with this little optimization, I use 30% of my memory in the process and made only about 6.5% of the records. It seems that at this pace he does not have enough memory as before. In addition, my function is very slow. Is there something obvious that I can do to speed it up? Would delobjects with each pass of the loop help for?

def run():
    from geonames.models import POI
    f = file('data/US.txt')
    for l in f:
        li = l.split('\t')
        try:
            p = POI()
            p.geonameid = li[0]
            p.name = li[1]
            p.asciiname = li[2]
            p.alternatenames = li[3]
            p.point = "POINT(%s %s)" % (li[5], li[4])
            p.feature_class = li[6]
            p.feature_code = li[7]
            p.country_code = li[8]
            p.ccs2 = li[9]
            p.admin1_code = li[10]
            p.admin2_code = li[11]
            p.admin3_code = li[12]
            p.admin4_code = li[13]
            p.population = li[14]
            p.elevation = li[15]
            p.gtopo30 = li[16]
            p.timezone = li[17]
            p.modification_date = li[18]
            p.save()
        except IndexError:
            pass

if __name__ == "__main__":
    run()

EDIT, More (Apparently Important):

Memory consumption increases when a script is executed and saves more lines. The .save () method is a fake django model method with a unique fragment unique_slug that writes to postgreSQL / postgis db.

SOLVED: registering a DEBUG database in Django eats memory.

+3

python memory-management file django

Skylar Saveland 13 . '10 23:30

3

. , , xreadlines() ( ). , .

, . , - , POI.save().

+2

Max Shawabkeh 13 . '10 23:37

, : , ? - , , , , p.save() - , , . , , del, .

, POI, - , (, ? positional ...) POI, geonames.models, , - , POI gulp, () 100 , ( ).

+2

Alex Martelli 13 . '10 23:38

ericflo · Accepted Answer · 2010-03-13T23:53:31+0000

, Django DEBUG False

Save memory in Python. How to iterate over lines and save them with a 2 millionth file?

More articles: