Running the following C code (a bunch of mmaps and munmaps in a 2GB file) on a MacOS X machine seems much slower than on a Linux server.
#define BUFSZ 2000000000
static u_char buf[BUFSZ];
....
for (msize = 4096; msize <= 1048576; msize *= 16) {
fd = open("io_benchmark.dat", O_RDONLY);
if (fd < 0 ) die("can't open io_benchmark.dat for reading");
for (i = 0; i < 10000; i++) {
offset = (size_t) random() % (BUFSZ - 1048576);
mblock = femmap(fd, (off_t)offset, (size_t) msize, PROT_READ,
"test block");
total = 0;
for (j = 0; j < msize; j++) {
total += mblock[j];
}
femunmap(mblock, (size_t) msize, "test block");
}
printf("Elapsed time to mmap and munmap 10000 blocks of %d kB: %.4f sec\n",
msize/1024, (time = time_since_last_call()));
rslt = close(fd);
if (fd < 0 ) die("can't close io_benchmark.dat after reading");
}
In particular, a comparison of two cars
CPU Xeon E3113 dual core @ 3.00GHz Core 2 Duo @ 2.4GHz dual core
RAM 8GB 4GB
Kernel 2.6.18-92.el5PAE SMP i686 MacOS 10.6.4 Snow Leopard
Disk WD 250GB SATA 16MB cache 7200 RPM EXT3 Hitachi 250GB SATA 5400 RPM, journaled HFS+
It gives the following results:
Linux MacOS X
Time for 10000 4kB mmaps 0.0165 682.87
Time for 10000 64kB mmap 0.0170 657.79
Time for 10000 1MB mmaps 0.0217 633.38
Even considering the reduced memory size, it seems that this is unusual, since the file is only half the physical memory. Can someone point out a code change or a configuration change that could improve performance?
We are trying to use reads instead of mmaps, and this is significant, but this will require a substantial change to the existing code base (and mmap is much faster than reading on Linux).