Fast fix 32-bit (limited to 2 GB) fseek / ftell on freebsd 7

I have an old 32-bit C / C ++ program on FreeBSD, which is used remotely by hundreds of users, and the author of which does not fix it. This was written in an unsafe way, all file offsets are stored internally as unsigned 32-bit offsets and ftell / fseek functions where they are used. On FreeBSD 7 (the host platform for software) means that ftell and fseek use a 32-bit signed length :

  int fseek(FILE *stream, long offset, int whence); long ftell(FILE *stream); 

I need to quickly fix the program, because after 13 years of collecting data, some internal data files suddenly fell into a file size of 2 ^ 31 (2 147 483 7yy bytes), and the internal fseek / ftell assert now fails for any request.

In the FreeBSD7 world, there is fseeko / ftello hack 2GB + files.

  int fseeko(FILE *stream, off_t offset, int whence); off_t ftello(FILE *stream); 

The type off_t is not defined here; all I know now is that it is 8 bytes in size and looks like long long OR unsigned long long (I don't know which one).

Is it sufficient (for working with files up to 4 GB) and safe to search and replace all ftell to ftello , and all fseek - fseeko ( sed -i 's/ftell/ftello' , the same for searching), if possible using:

  unsigned long offset1,offset2; //32bit offset1 = (compute + it) * in + some - arithmetic; fseek(file, 0, SEEK_END); fseek(file, 4, SEEK_END); // or other small int constant offset2 = ftell(file); fseek(file, offset1, SEEK_SET); // No usage of SEEK_CUR 

and combinations of such calls.

What is off_t subscription? Is it possible to assign 64-bit off_t to a 32-bit unsigned offset? Will it work for bytes ranging from 2 GB to 4 GB?

What functions can be used to work with offset except ftell / fseek ?

+6
source share
1 answer

FreeBSD fseeko() and ftello() documented as POSIX.1-2001 compatible, which means off_t is a off_t integer type .

In FreeBSD 7, you can safely:

 off_t actual_offset; unsigned long stored_offset; if (actual_offset >= (off_t)0 && actual_offset < (off_t)4294967296.0) stored_offset = (unsigned long)actual_offset; else some_fatal_error("Unsupportable file offset!"); 

(In the LP64 architecture, the above would be silly, since off_t and long both be 64-bit integers. Then it would be safe, just silly, since all possible file offsets can be supported.)

What people often bite is that offset calculations must be done using off_t . That is, this is not enough to pass the result to off_t , you should use the values used in arithmetic for off_t . (Technically, you only need to make sure that every arithmetic operation is accurate off_t , but it’s easier for me to remember the rules if I just punt and discard all the operands.) For example:

 off_t offset; unsigned long some, values, used; offset = (off_t)some * (off_t)value + (off_t)used; fseeko(file, offset, SEEK_SET); 

Typically, offset calculations are used to search for a field in a particular record; arithmetic tends to remain unchanged. I really recommend that you move search operations to a helper function, if possible:

 int fseek_to(FILE *const file, const unsigned long some, const unsigned long values, const unsigned long used) { const off_t offset = (off_t)some * (off_t)value + (off_t)used; if (offset < (off_t)0 || offset >= (off_t)4294967296.0) fatal_error("Offset exceeds 4GB; I must abort!"); return fseeko(file, offset, SEEK_SET); } 

Now, if you are lucky when you know that all your offsets are aligned (to some integer, say 4), you can give yourself a couple of years more time to rewrite the application using the extension of the above:

 #define BIG_N 4 int fseek_to(FILE *const file, const unsigned long some, const unsigned long values, const unsigned long used) { const off_t offset = (off_t)some * (off_t)value + (off_t)used; if (offset < (off_t)0) fatal_error("Offset is negative; I must abort!"); if (offset >= (off_t)(BIG_N * 2147483648.0)) fatal_error("Offset is too large; I must abort!"); if ((offset % BIG_N) && (offset >= (off_t)2147483648.0)) fatal_error("Offset is not a multiple of BIG_N; I must abort!"); return fseeko(file, offset, SEEK_SET); } int fseek_big(FILE *const file, const unsigned long position) { off_t offset; if (position >= 2147483648UL) offset = (off_t)2147483648UL + (off_t)BIG_N * (off_t)(position - 2147483648UL); else offset = (off_t)position; return fseeko(file, offset, SEEK_SET); } unsigned long ftell_big(FILE *const file) { off_t offset; offset = ftello(file); if (offset < (off_t)0) fatal_error("Offset is negative; I must abort!"); if (offset < (off_t)2147483648UL) return (unsigned long)offset; if (offset % BIG_N) fatal_error("Offset is not a multiple of BIG_N; I must abort!"); if (offset >= (off_t)(BIG_N * 2147483648.0)) fatal_error("Offset is too large; I must abort!"); return (unsigned long)2147483648UL + (unsigned long)((offset - (off_t)2147483648UL) / (off_t)BIG_N); } 

The logic is simple: if the offset is less than 2 31 it is used as-is. Otherwise, it is represented by a value of 2 31 + BIG_N Γ— (offset - 2 31 ). The only requirement is that an offset of 2 31 and higher is always a multiple of BIG_N.

Obviously, you should use only the three above functions - plus any fseek_to () options that you need if they perform the same checks, just use different parameters and a formula for the offset - you can support file sizes up to 2147483648 + BIG_N Γ— 2147483647. For BIG_N == 4, that is 10 GiB (less than 4 bytes, 10 737 418 236 bytes, to be precise).

Questions?


Edited to clarify:

Start by replacing fseek(file, position, SEEK_SET) with calls to fseek_pos(file, position) ,

 static inline void fseek_pos(FILE *const file, const unsigned long position) { if (fseeko(file, (off_t)position, SEEK_SET)) fatal_error("Cannot set file position!"); } 

and fseek(file, position, SEEK_END) with calls to fseek_end(file, position) (for symmetry - I assume that the position for this is usually an integer constant),

 static inline void fseek_end(FILE *const file, const off_t relative) { if (fseeko(file, relative, SEEK_END)) fatal_error("Cannot set file position!"); } 

and finally ftell(file) with calls to ftell_pos(file) :

 static inline unsigned long ftell_pos(FILE *const file) { off_t position; position = ftello(file); if (position == (off_t)-1) fatal_error("Lost file position!"); if (position < (off_t)0 || position >= (off_t)4294967296.0) fatal_error("File position outside the 4GB range!"); return (unsigned long)position; } 

Since your architecture and OS unsigned long is a 32-bit unsigned long integer type, and off_t is a 64-bit signed integer type, this gives you a full range of 4 GB.

For offset calculations, define one or more functions similar to

 static inline void fseek_to(FILE *const file, const off_t term1, const off_t term2, const off_t term3) { const off_t position = term1 * term2 + term3; if (position < (off_t)0 || position >= (off_t)4294967296.0) fatal_error("File position outside the 4GB range!"); if (fseeko(file, position, SEEK_SET)) fatal_error("Cannot set file position!"); } 

For each offset calculation algorithm, define one variant fseek_to . Name the parameters so that arithmetic makes sense. Make the parameters const off_t as above, so you don't need additional casts in arithmetic. Only the parameters and the string const off_t position = , which determine the calculation algorithm, differ between variant functions.

Questions?

+12
source

Source: https://habr.com/ru/post/971477/


All Articles