I have two files - call them file0 and file1.
What I would like to get is a fast algorithm for the next task (itβs clear to me how to write a rather slow algorithm that solves it):
Determine the largest file suffix file0, which is the prefix file1, which means memory block B (or, more precisely, the number of bytes of such a memory block) of maximum length so that
- file0 consists of some block of memory A followed by B
- file1 constist of memory block B followed by some memory block C
Note that blocks A, B, and C can also have lengths of zero bytes.
Edit (to answer a dry remark): the obvious rather slow algorithm that I think of (Pseudocode): let the file length be limited to m, n with wlog m <= n.
for each length from m to 0 compare the m last bytes of file0 with the first m bytes of file1 if they are equal return length
This is obviously an O (m * min (m, n)) algorithm. If the files are the same size, this is O (n ^ 2).
The files that I have to process at the moment are between 10 and several megabytes of hundread. But as a last resort, they can also be several gigabytes in size - simple enough to no longer fit into the x86 32-bit address space.
Nubok source share