Questions about EXIF ​​in hexadecimal form

I am trying to understand part of the EXIF ​​header of a jpeg file (in hexadecimal format) and how to understand it, so I can extract data, in particular GPS data. For better or worse, I am using VB.Net 2008 (sorry, this is what I can understand now). I extracted the first 64K from jpg into a byte array and have a vague idea of ​​how the data is ordered. Using the specifications of the EXIF ​​specification, versions 2.2 and 2.3, I see that there are tags that must match the actual byte sequences in the file. I see that there is a “GPS IFD” which has a value of 8825 (in the hex). I am looking for the hexadecimal string 8825 in a file (I understand these are two bytes 88 and 25), and I believe that after 8825 there is a sequence of bytes. I suspect that these subsequent bytes indicate where in the file, offset method, GPS data will be located. For example, I have the following hexadecimal bytes starting from 88 25: 88 25 00 04 00 00 00 01 00 00 05 9A 00 00 07 14. Is a string longer than 16 bytes? I get the impression that in this data line I should say where to find the actual GPS data in the file.

Looking at http://search.cpan.org/~bettelli/Image-MetaData-JPEG-0.153/lib/Image/MetaData/JPEG/Structures.pod#Exif_and_DCT , halfway down, he says that "Each block IFD is a structured sequence of records called Exif jargon, compatibility arrays. The beginning of IFD 0 is given by the value "IFD0_Pointer. The IFD structure is as follows:"

So what is IFD0_Pointer? Is it related to bias? I guess the offset is so many bytes from the starting point. If so, where is this starting point?

Thanks for any answers.

Dale

+6
source share
2 answers

I suggest you read the Exif Specification (PDF); it is understandable and fairly easy to follow. For a quick guide, here is a summary of the article I wrote:


A JPEG / Exif file starts at the beginning of an image marker (SOI). SOI consists of two magic bytes 0xFF 0xD8 , identifying the file as a JPEG file. After SOI, there are a number of Mark Marker sections (APP0, APP1, APP2, APP3, ...), including metadata.

Application Token Sections

Each APPn section begins with a marker. For the APP0 section, the marker is 0xFF 0xE0 for the APP1 section 0xFF 0xE1 , etc. Token bytes are followed by two bytes for the size of the partition (excluding the token, including size bytes). The length field is followed by the data of the variable size application. APPn sections are sequential, so you can skip entire sections (using the size of the section) until you reach the one that interests you. The content of APPn partitions depends on the following: Exif APP1 partition only.

Exif APP1 Section

Exif metadata is stored in the APP1 section (there can be more than one APP1 section). Application data in the Exif APP1 section consists of the Exif marker 0x45 0x78 0x69 0x66 0x00 0x00 ( "Exif\0\0" ), the TIFF header, and several sections of the image file directory (IFD).

TIFF Header

The TIFF header contains information about the byte order of the IFD sections and the pointer to the 0th IFD. The first two bytes: 0x49 0x49 ( II for Intel), if the byte order is not significant or 0x4D 0x4D ( MM for Motorola) for big-endian. The next two bytes are magic bytes 0x00 0x2A ( 42 ;)). And the next four important bytes will tell you the offset to the 0th IFD from the beginning of the TIFF header.

Important: The JPEG file itself (what you read so far) will always be in big-endian format. However, the byte order in the IFD subkeys may be different and must be converted (you know the byte order from the TIFF header above).

Image File Directories

Once you achieve this, you have a pointer to the IFD section 0 and you are ready to read the actual metadata. The remaining IFDs are mentioned in various places. Offsets to IFF Exif and GPS IFDs are given in the 0th IFD fields. The offset of the first IFD is given after the 0th IFD field. An offset to IFD compatibility is given in IFF Exif.

IFDs are simply sequential records of metadata fields. The field counter is set in the first two bytes of the IFD. After counting the fields, 12 byte fields. After the fields, there is a 4-byte offset from the beginning of the TIFF header to the beginning of the first IFD. This value only makes sense for the 0th IFD. After that there is an IFD data section.

IFD Fields

Fields are 12-byte subkeys of IFD sections. The first two bytes of each field give the tag identifier defined in the Exif standard. The next two bytes give the data type of the field. You will have 1 for byte , 2 for ascii , 3 for short ( uint16 ), 4 for long ( uint32 ), etc. Check Exif Specification for a complete list.

The next four bytes can be a bit confusing. For byte arrays ( ascii and undefined types ), the byte length of the array is specified. For example, for the Ascii: "Exif" , the counter will be 5, including the null terminator. For other types, this is the number of field components (for example, 4 shorts, 3 rationals).

Following the count, we have a 4-byte field value. However, if the field data is longer than 4 bytes, it will be stored in the IFD data section. In this case, this value will be the offset from the beginning of the TIFF header to the beginning of the field data. For example, for long ( uint32 , 4 bytes) this will be the value of the field. For rational ( 2 x uint32 , 8 bytes) this will be an offset to the 8-byte field data.


This is basically how metadata is placed in a JPEG / Exif file. There are a few caveats to keep in mind (remember that if necessary you need to convert the byte order, offsets from the beginning of the TIFF header, go to data sections to read long fields ...), but the format is pretty easy to read, the color representation of HEX is given below JPEG / Exif file. The blue block is the SOI, the orange is the TIFF header, the green is the IFD size and offset bytes, the light purple blocks are IFD fields, and dark purple are field data.

HEX View of a JPEG / Exif File

+15
source

Here is the php script I wrote to change exif headers.

 <?php $full_image_string=file_get_contents("torby.jpg"); $filename="torby.jpg"; if (isset($_REQUEST['filename'])){$filename=$_REQUEST['filename'];} if (array_key_exists('file', $_REQUEST)) { $thumb_image = exif_thumbnail($_REQUEST['file'], $width, $height, $type); } else { $thumb_image = exif_thumbnail($filename, $width, $height, $type); } if ($thumb_image!==false) { echo $thumb_image; $thumblen=strlen($thumb_image); echo substr_count($full_image_string,$thumb_image); $filler=str_pad("%%%THUMB%%%", $thumblen); $full_image_string=str_replace($thumb_image,$filler,$full_image_string); file_put_contents("torby.jpg",$full_image_string); exit; } else { // no thumbnail available, handle the error here echo 'No thumbnail available'; } ?> 
0
source

Source: https://habr.com/ru/post/898071/


All Articles