It depends on the hardware and software architecture of the OpenCL platform used. For example, you can imagine an architecture with read-only caches that should not be involved in cache coherency. These caches can be used for read-only memory, but not for global memory. This way you can see faster access to read-only memory.
Speaking of which, none of the architectures I am familiar with works in this way. So just hypothetically.
source share