Suppose I have an array of 1,000,000 elements and several workflows, each of which manipulates data in this array. Workflows can update already filled elements with new data, but each operation is limited to one element of the array and does not depend on the values ββof any other element.
Using one mutex to protect the entire array will clearly lead to high competition. On the other hand, I could create an array of mutexes with the same length as the original array, and for each element of array[i] I would lock mutex[i] while working on it. Assuming an even distribution of data, this basically eliminates the lock conflict due to large memory.
I think a more reasonable solution would be to have an array of n mettexes (where 1 <n <1,000,000). Then for each element of array[i] I would block mutex[i % n] while working on it. If n is large enough, I can still minimize the conflict.
So my question is, is there a performance limitation when using a large number of mutexes (e.g.> = 1,000,000) in this way, in addition to increasing memory usage? If so, how many mutexes can you reasonably use before you begin to see deterioration?
I am sure that the answer to this is somewhat platform specific; I use pthreads on Linux. I am also working on customizing my own tests, but the scale of the data I'm working on does this a lot of time, so some initial recommendations will be appreciated.
That was the initial question. For those who ask for more detailed information about the problem, I have four binary files with several GB, describing somewhere around half a billion events that are being analyzed. The array in question is actually an array of pointers supporting a very large hash table chain. We read four data files into a hash table, possibly combining them together if they have certain characteristics. In the existing implementation, there are 4 streams, each of which reads one file and inserts entries from this file into the hash table. The hash table has 997 locks and 997 * 9973 = ~ 10,000,000 pointers. When inserting an element with a hash h I first lock mutex[h % 997] before inserting or modifying an element in bucket[h % 9943081] . This works well, and as far as I can tell, we did not have too many problems with rivalry, but there is a performance bottleneck because we use only 4 cores of a 16-core machine. (And even smaller as we go, because the files usually do not have the same size.) After all the data has been read in memory, we analyze it, which uses new threads and a new blocking strategy configured for different workloads.
I am trying to improve the performance of the data loading phase by switching to a thread pool. In the new model, I still have one thread for each file, which simply reads the file in ~ 1 MB chunks and transfers each chunk to the workflow in the pool for parsing and pasting. So far, the increase in performance has been minimal, and the profiling I did seemed to indicate that the time taken to lock and unlock the array was a likely culprit. A lock is built into the hash table implementation that we use, but it allows you to specify the number of locks to use regardless of the size of the table. I hope to speed things up without changing the hash table implementation itself.