Mmap big endian vs little endian

If I use mmap to write uint32_t , will I encounter problems with large end / small end conventions? In particular, if I write some mmap 'ed data on a large end machine, will I encounter problems when I try to read this data on a machine with small details?

+2
source share
2 answers

If you use mmap, you are probably worried about speed and efficiency. You basically have several options.

  • Wrap all your reads and notes with the htonl, htons, ntohl, ntohs functions. Calling the htonl (host to network) method on Windows will convert the data from small endian to big endian. On other architectures, this will be noop. These conversions have overhead, but depending on your operations, they may or may not be significant. AFAIK, this is the approach used by SQLite
  • Another option is to always record data in the host format and provide procedures if users need to transfer data on different platforms. Databases typically read and write data in host format, but provide tools like bcp that will write to either ASCII or network byte order.
  • You can mark the header of your file with a byte of byte order. When your program starts, it compares it with the byte order of the file and, if necessary, provides any translation. This is often useful for simple data formats such as UTF-16, but not for formats where you have several types of variable length.

Also, if you do things like provide prefixes for file length or offset, you might have a mixture of 32-bit and 64-bit pointers. A 32-bit platform cannot create a mmap view larger than 4 GB, so it is unlikely that you will support file sizes larger than 4 GB. Programs such as rrdtool use this approach and support much larger file sizes on 64-bit platforms. This means that your binary will not be compatible on all platforms if you used the size of the platform pointer inside your file.

My recommendation is to ignore all the problems with front-byte ordering and create a system to work quickly on your platform. If / when you need to move your data to another platform, choose the easiest / fastest / most suitable method. If you start by creating a platform-independent data format, you will usually make mistakes and you will have to go back and fix those errors later. This is especially problematic when 99% of the data is in the correct byte order, and 1% of them are erroneous. This means that correcting errors in the data transfer code will break existing customers on all platforms.

Before writing code to support more than one platform, you will need a multi-platform test installation.

+5
source

Yes.

mmap displays raw file data for address space processing. He knows nothing about what raw data is, not to mention trying to convert it for you. If you map the same file with an architecture with different content, you will need to do any necessary conversion yourself.

As a portable data format on different computers, I would consider something a higher value of abstraction, such as JSON or even XML, which does not bind the data format to a specific implementation. But it really depends on your specific requirements.

+2
source

Source: https://habr.com/ru/post/1204264/


All Articles