Thank you all for your answers.
I created a solution (rather than a "solution") for the proposed problem, and since others may be useful, I am posting the code here. My solution is a hybrid of options 1 and 3, proposed by Adam Matan. The code contains line numbers from my vi session, which will help in the discussion below.
12
Lines 36-47 are simply preliminary materials related to the definition of the problem, which was part of the original question. The setting for multiprocessing to get around cPython GIL is on lines 49-56, and lines 57-70 are used to evenly create shared tasks. The code in lines 57-70 is used instead of itertools.product because when the list of row / column identifiers reaches 40,000 or so, the product ends up with a huge amount of memory.
The actual calculation that needs to be done is on lines 74-78, and here we use the common vocabulary ID-> vector records and the general result queue.
Lines 81-85 configure the actual process objects, although they are not already running.
In my first attempt (not shown here), "try ... resultQueue.get () and throw an exception ..." the code was actually outside the external control loop (although not all calculations were completed). When I ran this version of the code on the unit test of the 9x9 matrix, there were no problems. However, moving up to 200x200 or higher, I found that this code freezes, even though it does not change anything in the code between executions.
According to this discussion (http://bugs.python.org/issue8426) and the official documentation for multiprocessing, using multiprocess.Queue may hang if the base implementation does not have a large pipe / socket size, so the code provided here as my solution , periodically empties the queue when checking the completion of processes (see Lines 91-106) so that child processes can continue to add new results to it and avoid channel overflows.
When I tested the code on larger 1000x1000 matrices, I noticed that the calculation code was completed long before the queue and matrix assignment codes. Using cProfile, I found that the default polling interval was processCheckTime = 1.0 (line 23), and lowering this value improved the speed of the results (see the bottom of the column for synchronization examples). This can be useful information for other people new to multiprocessing in Python.
In general, this may not be the best possible implementation, but it provides a starting point for further optimization. It is often said that optimization through parallelization requires proper analysis and thought.
Temporary examples, all with 8 processors.
200x200 (calculations / tasks 20100)
t = 1.0: runtime 18 s
t = 0.01: runtime 3s
500x500 (125250 calculations / assignments)
t = 1.0: runtime 86s
t = 0.01: run time 23 s
If someone wants to copy and paste the code, here is the block test I used for the development part. Obviously, the tagged code for the matrix class is missing here, and the fingerprint reader / scorer code is not included (although it is quite easy to minimize it yourself). Of course, I am happy to share this code if I helped someone.
112 def unitTest(): 113 import cStringIO, os 114 from fingerprintReader import MismatchKernelReader 115 from fingerprintScorers import FeatureVectorLinearKernel 116 exampleData = cStringIO.StringIO() # 9 examples from GPCR (3,1)-mismatch descriptors, first 10 columns. 117 exampleData.write( ",AAA,AAC,AAD,AAE,AAF,AAG,AAH,AAI,AAK" + os.linesep ) 118 exampleData.write( "TS1R2_HUMAN,5,2,3,6,8,6,6,7,4" + os.linesep ) 119 exampleData.write( "SSR1_HUMAN,11,6,5,7,4,7,4,7,9" + os.linesep ) 120 exampleData.write( "OXYR_HUMAN,27,13,14,14,15,14,11,16,14" + os.linesep ) 121 exampleData.write( "ADA1A_HUMAN,7,3,5,4,5,7,3,8,4" + os.linesep ) 122 exampleData.write( "TA2R_HUMAN,16,6,7,8,9,10,6,6,6" + os.linesep ) 123 exampleData.write( "OXER1_HUMAN,10,6,5,7,11,9,5,10,6" + os.linesep ) 124 exampleData.write( "NPY1R_HUMAN,3,3,0,2,3,1,0,6,2" + os.linesep ) 125 exampleData.write( "NPSR1_HUMAN,0,1,1,0,3,0,0,6,2" + os.linesep ) 126 exampleData.write( "HRH3_HUMAN,16,9,9,13,14,14,9,11,9" + os.linesep ) 127 exampleData.write( "HCAR2_HUMAN,3,1,3,2,5,1,1,6,2" ) 128 columnIDs = ( "TS1R2_HUMAN", "SSR1_HUMAN", "OXYR_HUMAN", "ADA1A_HUMAN", "TA2R_HUMAN", "OXER1_HUMAN", 129 "NPY1R_HUMAN", "NPSR1_HUMAN", "HRH3_HUMAN", "HCAR2_HUMAN", ) 130 m = createSimilarityMatrix( exampleData, MismatchKernelReader, FeatureVectorLinearKernel, columnIDs, 131 verbose=True, ) 132 m.SetOutputPrecision( 6 ) 133 print m 134 135 ## end of unitTest()