Understanding binary xls file

Question

Understanding binary xls file

I am trying to read the contents of an xls file without using any xls libraries, but there are problems.

I am trying to use the information I found here . It has a small step-by-step instruction on how to read a file. Also use xls-file-specification .

I'm not sure if this step was taken correctly:

3.Open the workbook stream and scan the first instance of the BOF record. This is the start of the Globals subflow.

According to the file specification or this page with a list of record numbers , I have to look for 2057 (0809h) but the whole file does not contain this record anywhere (also when using the hexa editor when trying to find it). But then I read this part on page 20 in the spec:

Byte. Excel BIFF file exchange is transferred through MS-DOS / Windows and Apple Macintosh, including. To support portability, Excel writes BIFF files, where the low byte of the word appears first in the file, followed by the highest byte order.

If I understand what is correct (not sure if I am doing this) a large endian of words is used, so what I'm looking for is actually 2312 (0908h). This gives the impression of correctness, as it occurs very early in every file I try to make.

So, go to the next step:

4, read the Globals substream, load the BoundSheet8 and SST entries into memory. See the Global section for more information.

I am looking for 133 (8500 hours) and found shortly after the BOF, alright. But the problem is the following two steps:

5. From the BoundSheet8 entry corresponding to the subflow you want to open, read the first 4 bytes that contains the lbPlyPos FilePointer file. 6. Go to the offset in the stream specified in the lbPlyPos FilePointer file. This is a BOF entry for a worksheet.

So, the next 4 bytes is a pointer pointing to the position in the file that I should go to. But reading these bytes in any order gives me a number that is larger than the whole file. And also this part confuses me: "This is a BOF entry for a worksheet." Isn't that what I found at an earlier stage? Hm ...

Sorry for my incoherent character. And I hope that I make sense and that someone will want to help me a little.

Update: Ok, I got a little further. This is quite confusing to me, but it seems that each record also reads like a "big horse", i.e. The last variable in the record is the one that is positioned first in the file. Although I don’t know if this applies to variable length values? So, looking at this , variable-length values are listed as the last in the record. But it is obvious that they cannot be the first in the file, because there would be no way to know how many bytes to read if this information appears after it? In any case, if I ignore this value and skip 2 bytes for dt and A / unused and read the next 4 bytes as uint, it will turn out to be 1130 in my case. By adding this to the position of the first BOF, I get the exact position of the BOF sheet. And that can't be a coincidence, right?

Now the following problem arises. After this BOF entry, the index entry should immediately follow. But no matter how I read in bytes, it still doesn't make sense ... Here's what it looks like:

09 08 10 00 00 06 10 00 BB 0D CC 07 00 00 00 00 06 00 00 00 00 02 0E 00 00 00 00 00 1E 00 00 00 00 00 12 00 00 00 3E 02 12 00 B6 06 00 00 00 00 40 00 00 00 00 00 00 00 00 00 00 00 7D 00 0C 00 00 00 00 00 00 DD 06 0F 00 00 00 00 00 00 7D 00 0C 00 02 00 02 00 DD 06 0F 00 00 00 00 00 7D 00 0C 00 04 00 04 etc. .d.

The first 2 bytes in which the record is BOF 09 08 or 0809, which has a value of 2057 (which is a BOF), so the rest should be INDEX but it does not make sense ... I would really appreciate if someone would help me with this.

+4

excel binaryfiles

Clox Mar 17 '12 at 8:32

source share

2 answers

len · Answer 1 · 2012-03-19T00:48:24+0000

As for the BOF record, I can say that it refers to the beginning of the file and is located at the beginning of each substream that contains the excel file. Given that you usually have 3 sheets, all sheets have VBA code sheets, and the book has code sheets that you look at 8 BOF entries.

Gavin smith · Answer 2 · 2013-09-18T20:48:43+0000

A BOF record is not only the first two bytes. The next two bytes of "10 00" tell you the length of the rest of the record (this means 0x0010 or 16 bytes). However, after counting forward 16 bytes, there is no index entry. (From the list of record identifiers, the index record identifier should be 523, that is, 0x020b, which will display as "0b 02.")

You must look at the wrong BOF. You must either not find the lbPlyPos pointer, or follow it incorrectly.

So, the next 4 bytes is a pointer pointing to the position in the file that I should go to. But reading these bytes in any order gives me a number that is larger than the whole file

Make sure you skip two bytes that give you the size of the record.

Understanding binary xls file

More articles: