Here is my solution.
It is easy in principle (ŁukaszW.pl gave it), but it is not so easy to code if you want to take care of specific cases (which ŁukaszW.pl did not).
Special cases: when the ROW_DEL separator is split into two of the read fragments (as I4V pointed out), and even more subtly if there are two adjacent ROW_DEL, of which the second is divided into two read fragments.
Since ROW_DEL is longer than any of the possible newlines ( '\r' , '\n' , '\r\n' ), it can be replaced in place on the new line used by the OS. That is why I decided to rewrite the file on my own.
For this, I use the 'r+' mode, it does not create a new file.
It is also imperative to use the binary mode 'b' .
The principle is to read a piece (in real life its size will be 262144, for example) and x additional characters, wher x - separator length - 1.
And then, to check if a separator is present at the end of the fragment + x characters.
Accoridng, if present or not, the piece is shortened or not before the ROW_DEL conversion is performed, and rewritten in place.
Nude code:
text = ('The hospital roommate of a man infected ROW_DEL' 'with novel coronavirus (NCoV)ROW_DEL' '—a SARS-related virus first identified ROW_DELROW_DEL' 'last year and already linked to 18 deaths—ROW_DEL' 'has contracted the illness himself, ROW_DEL' 'intensifying concerns about the ROW_DEL' "virus ability to spread ROW_DEL" 'from person to person.') with open('eessaa.txt','w') as f: f.write(text) with open('eessaa.txt','rb') as f: ch = f.read() print ch.replace('ROW_DEL','ROW_DEL\n') print '\nlength of the text : %d chars\n' % len(text) #========================================== from os.path import getsize from os import fsync,linesep def rewrite(whichfile,sep,chunk_length,OSeol=linesep): if chunk_length<len(sep): print 'Length of second argument, %d , is '\ 'the minimum value for the third argument'\ % len(sep) return x = len(sep)-1 x2 = 2*x file_length = getsize(whichfile) with open(whichfile,'rb+') as fR,\ open(whichfile,'rb+') as fW: while True: chunk = fR.read(chunk_length) pch = fR.tell() twelve = chunk[-x:] + fR.read(x) ptw = fR.tell() if sep in twelve: pt = twelve.find(sep) m = ("\n !! %r is " "at position %d in twelve !!" % (sep,pt)) y = chunk[0:-x+pt].replace(sep,OSeol) else: pt = x m = '' y = chunk.replace(sep,OSeol) pos = fW.tell() fW.write(y) fW.flush() fsync(fW.fileno()) if fR.tell()<file_length: fR.seek(-x2+pt,1) else: fW.truncate() break rewrite('eessaa.txt','ROW_DEL',14) with open('eessaa.txt','rb') as f: ch = f.read() print '\n'.join(repr(line)[1:-1] for line in ch.splitlines(1)) print '\nlength of the text : %d chars\n' % len(ch)
To execute the execution, enter another code that prints the messages:
text = ('The hospital roommate of a man infected ROW_DEL' 'with novel coronavirus (NCoV)ROW_DEL' '—a SARS-related virus first identified ROW_DELROW_DEL' 'last year and already linked to 18 deaths—ROW_DEL' 'has contracted the illness himself, ROW_DEL' 'intensifying concerns about the ROW_DEL' "virus ability to spread ROW_DEL" 'from person to person.') with open('eessaa.txt','w') as f: f.write(text) with open('eessaa.txt','rb') as f: ch = f.read() print ch.replace('ROW_DEL','ROW_DEL\n') print '\nlength of the text : %d chars\n' % len(text) #========================================== from os.path import getsize from os import fsync,linesep def rewrite(whichfile,sep,chunk_length,OSeol=linesep): if chunk_length<len(sep): print 'Length of second argument, %d , is '\ 'the minimum value for the third argument'\ % len(sep) return x = len(sep)-1 x2 = 2*x file_length = getsize(whichfile) with open(whichfile,'rb+') as fR,\ open(whichfile,'rb+') as fW: while True: chunk = fR.read(chunk_length) pch = fR.tell() twelve = chunk[-x:] + fR.read(x) ptw = fR.tell() if sep in twelve: pt = twelve.find(sep) m = ("\n !! %r is " "at position %d in twelve !!" % (sep,pt)) y = chunk[0:-x+pt].replace(sep,OSeol) else: pt = x m = '' y = chunk.replace(sep,OSeol) print ('chunk == %r %d chars\n' ' -> fR now at position %d\n' 'twelve == %r %d chars %s\n' ' -> fR now at position %d' % (chunk ,len(chunk), pch, twelve,len(twelve),m, ptw) ) pos = fW.tell() fW.write(y) fW.flush() fsync(fW.fileno()) print (' %r %d long\n' ' has been written from position %d\n' ' => fW now at position %d' % (y,len(y),pos,fW.tell())) if fR.tell()<file_length: fR.seek(-x2+pt,1) print ' -> fR moved %d characters back to position %d'\ % (x2-pt,fR.tell()) else: print (" => fR is at position %d == file size\n" ' File has thoroughly been read' % fR.tell()) fW.truncate() break raw_input('\npress any key to continue') rewrite('eessaa.txt','ROW_DEL',14) with open('eessaa.txt','rb') as f: ch = f.read() print '\n'.join(repr(line)[1:-1] for line in ch.splitlines(1)) print '\nlength of the text : %d chars\n' % len(ch)
There is some subtlety in processing the ends of chunks to determine if ROW_DEL is in two pieces, and if there are two ROW_DEL adjacent. That's why I published my decision for a long time: I finally had to write fR.seek(-x2+pt,1) and not only fR.seek(-2*x,1) , or fR.seek(-x,1) in accordance with the fact that sep is cross-border or not (2 * x is x2 in the code, with ROW_DEL x and x2 - 6 and 12). Anyone who is interested in this issue will consider it by changing the codes in the accoridng sections if 'ROW_DEL' is in twelve or not.