I solve the problem of using both the capabilities of the 8-core machine and the high-end GPU (Tesla 10) .
I have one large input file, one thread for each core and one for processing the GPU. The Gpu stream, in order to be efficient, requires a large number of lines from input, while the Cpu stream only needs one line to continue (saving several lines in a temporary buffer was slower). The file does not need to be read sequentially. I am using boost .
My strategy is to have a mutex in the input stream, and each stream blocks - unlocks . This is not optimal, because the gpu thread should have higher priority when locking the mutex, being the fastest and most demanding.
I can come up with different solutions, but before rushing into the implementation, I would like to have some recommendations.
What approach do you use / recommend?
source
share