What strategies are effective for handling parallel reads on heterogeneous multicore architectures?

Question

What strategies are effective for handling parallel reads on heterogeneous multicore architectures?

I solve the problem of using both the capabilities of the 8-core machine and the high-end GPU (Tesla 10) .

I have one large input file, one thread for each core and one for processing the GPU. The Gpu stream, in order to be efficient, requires a large number of lines from input, while the Cpu stream only needs one line to continue (saving several lines in a temporary buffer was slower). The file does not need to be read sequentially. I am using boost .

My strategy is to have a mutex in the input stream, and each stream blocks - unlocks . This is not optimal, because the gpu thread should have higher priority when locking the mutex, being the fastest and most demanding.

I can come up with different solutions, but before rushing into the implementation, I would like to have some recommendations.

What approach do you use / recommend?

+3

c ++ multithreading design boost concurrency

fabrizioM Apr 15 '10 at 8:46

source share

3 answers

:

1) IO, .

2) . , .

, , . , , fpos . , . , , .

:

1) . ( + )

2) . Thread , .

, , .

+1

user283145 17 . '10 20:25

I would use a buffer. Have one thread filling this buffer from disk. Each thread locks the buffer, reads the data in the stream buffer, and then releases the lock on the mutex before processing the data.

0

Menox Apr 17 '10 at 2:48

source share

Sedat Kapanoglu · Accepted Answer · 2010-04-18T10:36:33+0000

, "1 " , 2 . . , 1024 ( ): . :

#define BLOCK_SIZE (1024 * 1024)
#define REGULAR_THREAD_BLOCK_SIZE (BLOCK_SIZE/(2 * NUM_CORES)) // 64kb
#define GPU_THREAD_BLOCK_SIZE (BLOCK_SIZE/2)
64
- Core 1: offset 0, size = REGULAR_THREAD_BLOCK_SIZE
- Core 2: offset 65536, size = REGULAR_THREAD_BLOCK_SIZE
- Core 3: 131072, = REGULAR_THREAD_BLOCK_SIZE
- n: (n * REGULAR_THREAD_BLOCK_SIZE), size = REGULAR_THREAD_BLOCK_SIZE
GPU 512 , offset = (NUM_CORES * REGULAR_THREAD_BLOCK_SIZE), size = GPU_THREAD_BLOCK_SIZE

. , . , . , , , , :

void threadProcess(buf, startOFfset, blockSize)
{
    int offset = startOffset;
    int endOffset = startOffset + blockSize;
    if(coreNum > 0) {
        // skip to the next line
        while(buf[offset] != '\n' && offset < endOffset) offset++;
    }
    if(offset >= endOffset) return; // nothing left to process
    // read number of lines provided in buffer
    char *currentLine = allocLineBuffer(); // opening door to security exploits :)
    int strPos = 0;
    while(offset < endOffset) {
        if(buf[offset] == '\n') {
            currentLine[strPos] = 0;
            processLine(currentLine); // do line processing here
            strPos = 0; // fresh start
            offset++;
            continue;
        }
        currentLine[strPos] = buf[offset];
        offset++;
        strPos++;
    }
    // read the remaineder past the buf
    strPos = 0;
    while(buf[offset] != '\n') {
        currentLine[strPos++] = buf[offset++];
    }
    currentLine[strPos] = 0;
    processLine(currentLine); // process the carryover line
}

, , . ? . -, .

What strategies are effective for handling parallel reads on heterogeneous multicore architectures?

More articles: