I am trying to parallelize a function containing several procedures. Function:
void _myfunction(M1,M2){
for (a = 0; a < A; a++) {
Amatrix = procedure1(M1) /*contains for loops*/;
Bmatrix = procedure2(M1) /*contains for loops*/;
...
for ( z = 1 ; z < Z ; z++ ){
calculations with Amatrix(z) and obtain AAmatrix
calculations with Bmatrix(z) and obtain BBmatrix
for ( e = 1; e < E; e++) {
calculations with AAmatrix(e) and obtain CCmatrix
calculations with BBmatrix(e) and obtain DDmatrix
}
}
for (q = 0; q < Q; q++){ calculations with CCMatrix(q) }
for (m = 0; m < M; m++){ calculations with DDMatrix(q) }
}
}
As for the functions procedure1()and procedure2(), I ported them to CUDA, and everything is going well (each of these procedures has its own for loops). The reason these procedures are separated is because they are conceptually independent algorithms that are the opposite of the rest of the code, which has a more general concept.
CUDA, , . , , . , _myfunction(arg1,arg2,..) , , , . - , , , , .
: - , CUDA?
P.S: GeForce 9600GT (Compute Capability 1.1) CUDA Toolkit 5.0.