I work in some code that sends large amounts of data from the host to the device and behaves erratically.
In the following code fragment, I try to send an array from a node to a device. The size of the array increases at each iteration, gradually increasing the amount of memory sent to the device. The first element of the array is filled with a nonzero value, and it is read from the inside of the kernel and printed to the console. The value should be the same when reading from the host and device, but in some iterations this is not so.
Here is the code:
int SizeArray = 0; for(int j=1; j<100 ;j++){
The device in which this code was tested has the following functions:
- - Name: Intel (R) HD Graphics 4000 - DeviceVersion: OpenCL 1.1 - DriverVersion: 8.15.10.2696 - MaxMemoryAllocationSize: 425721856 - GlobalMemoryCacheSize: 2097152 - GlobalMemorySize: 1702887424 - MaxConstantBufferSize: 65536 - LocalMemorySize: 655
The kernel prints incorrect values ββor not, depending on the size of the buffer sent to the device.
Here's the conclusion:
Array GPU: 1.000000 Array GPU: 2.000000 Array GPU: 3.000000 Array GPU: 4.000000 Array GPU: 5.000000 Array GPU: 6.000000 Array GPU: 7.000000 Array GPU: 8.000000 Array GPU: 9.000000 Array GPU: 10.000000 Array GPU: 11.000000 Array GPU: 12.000000 Array GPU: 13.000000 Array GPU: 14.000000 Array GPU: 15.000000 Array GPU: 16.000000 Array GPU: 17.000000 Array GPU: 18.000000 Array GPU: 19.000000 Array GPU: 20.000000 Array GPU: 21.000000 Array GPU: 22.000000 Array GPU: 23.000000 Array GPU: 24.000000 Array GPU: 25.000000 Array GPU: 0.000000 <-------- INCORRECT VALUE, kernel is receiving corrupted memory Array GPU: 0.000000 <-------- INCORRECT VALUE, kernel is receiving corrupted memory Array GPU: 0.000000 <-------- INCORRECT VALUE, kernel is receiving corrupted memory Array GPU: 0.000000 <-------- INCORRECT VALUE, kernel is receiving corrupted memory Array GPU: 0.000000 <-------- INCORRECT VALUE, kernel is receiving corrupted memory Array GPU: 0.000000 <-------- INCORRECT VALUE, kernel is receiving corrupted memory Array GPU: 0.000000 <-------- INCORRECT VALUE, kernel is receiving corrupted memory Array GPU: 0.000000 <-------- INCORRECT VALUE, kernel is receiving corrupted memory Array GPU: 34.000000 Array GPU: 35.000000 Array GPU: 36.000000 Array GPU: 37.000000 Array GPU: 38.000000 Array GPU: 39.000000 Array GPU: 40.000000 Array GPU: 41.000000 Array GPU: 42.000000 Array GPU: 43.000000 Array GPU: 44.000000 Array GPU: 45.000000 Array GPU: 46.000000 Array GPU: 47.000000 Array GPU: 48.000000 Array GPU: 49.000000 Array GPU: 50.000000 Array GPU: 51.000000 Array GPU: 52.000000 Array GPU: 53.000000 Array GPU: 54.000000 Array GPU: 55.000000 Array GPU: 56.000000 Array GPU: 57.000000 Array GPU: 58.000000 Array GPU: 0.000000 <-------- INCORRECT VALUE, kernel is receiving corrupted memory Array GPU: 0.000000 <-------- INCORRECT VALUE, kernel is receiving corrupted memory Array GPU: 0.000000 <-------- INCORRECT VALUE, kernel is receiving corrupted memory Array GPU: 0.000000 <-------- INCORRECT VALUE, kernel is receiving corrupted memory Array GPU: 0.000000 <-------- INCORRECT VALUE, kernel is receiving corrupted memory Array GPU: 0.000000 <-------- INCORRECT VALUE, kernel is receiving corrupted memory Array GPU: 0.000000 <-------- INCORRECT VALUE, kernel is receiving corrupted memory Array GPU: 0.000000 <-------- INCORRECT VALUE, kernel is receiving corrupted memory Array GPU: 0.000000 <-------- INCORRECT VALUE, kernel is receiving corrupted memory Array GPU: 68.000000 Array GPU: 69.000000 ...
As you can see, invalid values ββare accepted by the device without a visible pattern, and the clEnqueueWriteBuffer function does not return an error code.
To summarize: a memory block is sent to the kernel, but the kernel receives zero memory, depending on the total sent block size.
The same code tested on different computers behaves differently (incorrect values ββin different iterations).
How can memory corruption be avoided? Did I miss something?
Thanks in advance.
Here's the full working code:
Edit: After some tests, you need to clarify that the problem is not printf. It seems that the problem is transferring data to the device prior to the kernel execution.
Here is the code without executing the kernel. The results are still not consistent.