I am trying to read the contents of an xls file without using any xls libraries, but there are problems.
I am trying to use the information I found here . It has a small step-by-step instruction on how to read a file. Also use xls-file-specification .
I'm not sure if this step was taken correctly:
3.Open the workbook stream and scan the first instance of the BOF record. This is the start of the Globals subflow.
According to the file specification or this page with a list of record numbers , I have to look for 2057 (0809h) but the whole file does not contain this record anywhere (also when using the hexa editor when trying to find it). But then I read this part on page 20 in the spec:
Byte. Excel BIFF file exchange is transferred through MS-DOS / Windows and Apple Macintosh, including. To support portability, Excel writes BIFF files, where the low byte of the word appears first in the file, followed by the highest byte order.
If I understand what is correct (not sure if I am doing this) a large endian of words is used, so what I'm looking for is actually 2312 (0908h). This gives the impression of correctness, as it occurs very early in every file I try to make.
So, go to the next step:
4, read the Globals substream, load the BoundSheet8 and SST entries into memory. See the Global section for more information.
I am looking for 133 (8500 hours) and found shortly after the BOF, alright. But the problem is the following two steps:
5. From the BoundSheet8 entry corresponding to the subflow you want to open, read the first 4 bytes that contains the lbPlyPos FilePointer file. 6. Go to the offset in the stream specified in the lbPlyPos FilePointer file. This is a BOF entry for a worksheet.
So, the next 4 bytes is a pointer pointing to the position in the file that I should go to. But reading these bytes in any order gives me a number that is larger than the whole file. And also this part confuses me: "This is a BOF entry for a worksheet." Isn't that what I found at an earlier stage? Hm ...
Sorry for my incoherent character. And I hope that I make sense and that someone will want to help me a little.
Update: Ok, I got a little further. This is quite confusing to me, but it seems that each record also reads like a "big horse", i.e. The last variable in the record is the one that is positioned first in the file. Although I don’t know if this applies to variable length values? So, looking at this , variable-length values are listed as the last in the record. But it is obvious that they cannot be the first in the file, because there would be no way to know how many bytes to read if this information appears after it? In any case, if I ignore this value and skip 2 bytes for dt and A / unused and read the next 4 bytes as uint, it will turn out to be 1130 in my case. By adding this to the position of the first BOF, I get the exact position of the BOF sheet. And that can't be a coincidence, right?
Now the following problem arises. After this BOF entry, the index entry should immediately follow. But no matter how I read in bytes, it still doesn't make sense ... Here's what it looks like:
09 08 10 00 00 06 10 00 BB 0D CC 07 00 00 00 00 06 00 00 00 00 02 0E 00 00 00 00 00 1E 00 00 00 00 00 12 00 00 00 3E 02 12 00 B6 06 00 00 00 00 40 00 00 00 00 00 00 00 00 00 00 00 7D 00 0C 00 00 00 00 00 00 DD 06 0F 00 00 00 00 00 00 7D 00 0C 00 02 00 02 00 DD 06 0F 00 00 00 00 00 7D 00 0C 00 04 00 04 etc. .d.
The first 2 bytes in which the record is BOF 09 08 or 0809, which has a value of 2057 (which is a BOF), so the rest should be INDEX but it does not make sense ... I would really appreciate if someone would help me with this.