Multithreading in ArcGIS with Python

Question

Multithreading in ArcGIS with Python

I have a python script that works fine when launched by itself. Based on a hard input directory, it scans all .mdb files and puts them in a list, and then iterates through them all in a for loop. Each iteration includes several restrictions on tables, joins, queries, etc.

The only problem: it takes about 36 hours for this dataset, and although this script will only be used for this dataset in this instance, I would like to increase performance because I often edit the selection of fields, the results to include, join methods, etc. . I would like to say that it takes a lot of time because my script is inefficient, but any inefficiency will be small, since almost all processing time is devoted to the geoprocessor object.

All that I have in my main script is:

indir = "D:\\basil\\input" mdblist = createDeepMdbList(indir) for infile in mdblist: processMdb(infile)

It also runs flawlessly when executed sequentially.

I tried using Parallel Python:

 ppservers = () job_server = pp.Server(ppservers=ppservers) inputs = tuple(mdblist) functions = (preparePointLayer, prepareInterTable, jointInterToPoint,\ prepareDataTable, exportElemTables, joinDatatoPoint, exportToShapefile) modules = ("sys", "os", "arcgisscripting", "string", "time") fn = pp.Template(job_server, processMdb, functions, modules) jobs = [(input, fn.submit(input)) for input in inputs]

Successfully create 8 processes, 8 geoprocessor objects ... and then it will work.

I did not experiment much with Python's built-in multithreading tools, but was hoping for some guidance to just create up to 8 processes going through the queue provided by mdblist. In no case could you try to write or read several files at once. To make things simpler, I also removed all my logging tools due to this problem; I ran this script long enough to know that it works, with the exception of 4 4104 input files, which have slightly different data formats.

Tips? Wisdom with trying multithreaded Arc Python scripts?

+4

python multithreading arcgis

Basilv Feb 04 '11 at 1:55

source share

2 answers

I compared the above methods in the same function. result:

 Starting pp with 1 workers Time elapsed: 4.625 s Starting pp with 2 workers Time elapsed: 2.43700003624 s Starting pp with 4 workers Time elapsed: 2.42100000381 s Starting pp with 8 workers Time elapsed: 2.375 s Starting pp with 16 workers Time elapsed: 2.43799996376 s

 Starting mul_pool with 1 p Time elapsed: 5.31299996376 s Starting mul_pool with 2 Time elapsed: 3.125 s Starting mul_pool with 4 Time elapsed: 3.56200003624 s Starting mul_pool with 8 Time elapsed: 4.5 s Starting mul_pool with 16 Time elapsed: 5.92199993134 s

-1

Goodmanding May 30 '11 at 2:55

source share

Basilv · Accepted Answer · 2011-02-21T19:27:59+0000

I thought that I would share what ultimately works for me and my experiences.

Using the backport multiprocessing module (code.google.com/p/python-multiprocessing) as per Joe's remark worked well. I had to change a couple of things in my script to deal with local / global variables and logging.

The main script is now:

 if __name__ == '__main__': indir = r'C:\basil\rs_Rock_and_Sediment\DVD_Data\testdir' mdblist = createDeepMdbList(indir) processes = 6 # set num procs to use here pool = multiprocessing.Pool(processes) pool.map(processMdb, mdblist)

Total time from ~ 36 hours to ~ 8 using 6 processes.

Some of the problems that I encountered were that, using separate processes, they address different memory stacks and completely transfer global variables. Queues can be used for this, but I have not implemented this, so everything is declared locally.

In addition, since pool.map can take only one argument, each iteration should create and then delete the geoprocessor object, and not create 8 gp and pass available for each iteration. Each iteration takes about a minute, so a couple of seconds to create it does not matter much, but it adds up. I have not done any specific tests, but actually it can be good practice, since anyone who has worked with Arcgis and python will know that the scripts drastically slow down the geoprocessor (for example, one of my scripts used a co-worker who I overloaded the input and time estimates until completion, switched from 50 hours after 1 hour of operation to 350 hours after starting during the night to 800 hours after starting 2 days ... it was canceled and entry is limited).

Hope someone else wants a multiprocessor large input with an error :). Next step: recursive, multiprocessing adds!

Multithreading in ArcGIS with Python

More articles: