I just tested this myself using gcc 4.6.1-1 on Debian (after adding typedef void *LPVOID ). There is no difference; both run instantly, even without any optimization.
I increased the length of the array to 1048576 to get a measurable runtime (0.161 s), which was the same for both IA32 and AMD64. I turned on optimization (-O3), and the time remained unchanged, but decreased to 0.157 s. -Os (optimization for size) had the same result.
Is it possible that you used different build options, for example, is it possible that some function for debugging memory access is enabled on AMD64?
source share