I think I'm in some really strange case with a border, maybe with double precision problems, and I want to know what is going on.
Inside the OpenCL kernel, I use:
#pragma OPENCL EXTENSION cl_khr_fp64 : enable __private int k = 2;
This leads me to s = 2.999[...]6
However, if I uncomment the line k=3
, I get (in my eyes) the correct result s = 3
. Why is this?
As additional information: the same behavior does not occur when I do
s = sqrt(s/(double)(k-1))
Following is the complete, minimum kernel and host code for pyopencl
Kernel (Minima.cl):
#pragma OPENCL EXTENSION cl_khr_fp64 : enable __kernel void init_z(__global double * buffer) { __private int x = get_global_id(0); __private int y = get_global_id(1);
Leading:
import pyopencl as cl import numpy as np platform = cl.get_platforms()[0] devs = platform.get_devices() device1 = devs[1] h_buffer = np.empty((10,10)).astype(np.float64) mf = cl.mem_flags ctx = cl.Context([device1]) Queue1 = cl.CommandQueue(ctx,properties=cl.command_queue_properties.PROFILING_ENABLE) Queue2 = cl.CommandQueue(ctx,properties=cl.command_queue_properties.PROFILING_ENABLE) mf = cl.mem_flags m_dic = {0:mf.READ_ONLY,1:mf.WRITE_ONLY,2:mf.READ_WRITE} fi = open('Minimal.cl', 'r') fstr = "".join(fi.readlines()) prg = cl.Program(ctx, fstr).build() knl = prg.init_z knl.set_scalar_arg_dtypes([None,]) knl_root = prg.root knl_root.set_scalar_arg_dtypes([None,]) def f(): d_buffer = cl.Buffer(ctx,m_dic[2], int(10 * 10 * 8)) knl.set_args(d_buffer) knl_root.set_args(d_buffer) a = cl.enqueue_nd_range_kernel(Queue2,knl,(10,10),None) b = cl.enqueue_nd_range_kernel(Queue2,knl_root,(10,10),None, wait_for = [a,]) cl.enqueue_copy(Queue1,h_buffer,d_buffer,wait_for=[b,]) return h_buffer a = f() a[0,0]
Edit: Due to some still unclear, I am updating this question again. I understand that the values โโof pow
and sqrt
need not be the same for the same input. My question is why pow
shows different output for SAME input, depending on where I get it from.
The binaries are on the pastebine: k_explicit and k_read
printf("a%\n", out)
results in 0x1.8p+1
with line k=3
and 0x1.7ffffffffffffp+1
when it is commenting.