Adding CUDA floating point gives the wrong answer (compared to CPU float ops)

I am new to CUDA. I used cuda to search for point prod float vectors, and I ran into the problem of adding a floating point in cuda. In essence, this is a simple core. I use -arch = sm_50 So the main idea is for thread_0 to add the values ​​of the vector a.

__global__ void temp(float *a, float *b, float *c) {

if (0 == threadIdx.x && blockIdx.x == 0 && blockIdx.y ==0 ) {
    float xx = 0.0f;
        for (int i = 0; i < LENGTH; i++){
            xx += a[i];
        }
        *c = xx;
    }
}

When I initialize 'a' with 1000 elements from 1.0, I get the desired result 1000.00

but when I initialize 'a' with 1.1, I should get 1100.00xx, but istead, I get 1099.989014. The implementation of the processor simply gives 1100.000024

I'm trying to figure out what the problem is !: - (

1.1 a yeilds 1000, . atomicAdd, .

, - !

EDIT: CPU GPU! , . GPU !:-(

+4
1

1.1 IEEE-754. @RobertCrovella , , CPU, IEEE-754, GPU.

, 1.1 0x3F8CCCCD= 1.10000002384185. 1000 , , , , .., 10 1000. 10 , 0x3F8CCC00, 1.09997558.

CUDA, 1000, 0x3F8CCC71, 32 .

CPU, , , . , , x90 FPU, 80 . 1.1 float, 1.10000002384185, 1000 , , , , 1100.00002384185, 1100.000024, .

Cpu 32- , addss SSE2, .

/fp: -mfpmath . fadd 80 .

GPU. IEEE-754 FPU x87.

+1

Source: https://habr.com/ru/post/1674354/


All Articles