I have a python script that works fine when launched by itself. Based on a hard input directory, it scans all .mdb files and puts them in a list, and then iterates through them all in a for loop. Each iteration includes several restrictions on tables, joins, queries, etc.
The only problem: it takes about 36 hours for this dataset, and although this script will only be used for this dataset in this instance, I would like to increase performance because I often edit the selection of fields, the results to include, join methods, etc. . I would like to say that it takes a lot of time because my script is inefficient, but any inefficiency will be small, since almost all processing time is devoted to the geoprocessor object.
All that I have in my main script is:
indir = "D:\\basil\\input" mdblist = createDeepMdbList(indir) for infile in mdblist: processMdb(infile)
It also runs flawlessly when executed sequentially.
I tried using Parallel Python:
ppservers = () job_server = pp.Server(ppservers=ppservers) inputs = tuple(mdblist) functions = (preparePointLayer, prepareInterTable, jointInterToPoint,\ prepareDataTable, exportElemTables, joinDatatoPoint, exportToShapefile) modules = ("sys", "os", "arcgisscripting", "string", "time") fn = pp.Template(job_server, processMdb, functions, modules) jobs = [(input, fn.submit(input)) for input in inputs]
Successfully create 8 processes, 8 geoprocessor objects ... and then it will work.
I did not experiment much with Python's built-in multithreading tools, but was hoping for some guidance to just create up to 8 processes going through the queue provided by mdblist. In no case could you try to write or read several files at once. To make things simpler, I also removed all my logging tools due to this problem; I ran this script long enough to know that it works, with the exception of 4 4104 input files, which have slightly different data formats.
Tips? Wisdom with trying multithreaded Arc Python scripts?