The following code creates the function What_Line_for_Position (pos) , which gives the line number for the position pos , that is, the number of the line that contains the character located at the pos position in the file.
This function can be used with any position as an argument, regardless of the value of the current position of the file pointer and from the history of movements of this pointer before calling the function.
Thus, using this function, you cannot limit the number of the current line only during continuous iteration on lines, as is the case with Greg Huguill's decision.
with open(filepath,'rb') as f: GIVE_NO_FOR_END = {} end = 0 for i,line in enumerate(f): end += len(line) GIVE_NO_FOR_END[end] = i if line[-1]=='\n': GIVE_NO_FOR_END[end+1] = i+1 end_positions = GIVE_NO_FOR_END.keys() end_positions.sort() def Which_Line_for_Position(pos, dic = GIVE_NO_FOR_END, keys = end_positions, kmax = end_positions[-1]): return dic[(k for k in keys if pos < k).next()] if pos<kmax else None
.
The same solution can be written using the fileinput module:
import fileinput GIVE_NO_FOR_END = {} end = 0 for line in fileinput.input(filepath,'rb'): end += len(line) GIVE_NO_FOR_END[end] = fileinput.filelineno() if line[-1]=='\n': GIVE_NO_FOR_END[end+1] = fileinput.filelineno()+1 fileinput.close() end_positions = GIVE_NO_FOR_END.keys() end_positions.sort() def Which_Line_for_Position(pos, dic = GIVE_NO_FOR_END, keys = end_positions, kmax = end_positions[-1]): return dic[(k for k in keys if pos < k).next()] if pos<kmax else None
But this solution has some disadvantages:
- he needs to import the fileinput module
- it deletes the entire contents of the file !! There should be something wrong in my code, but I don't know fileinput to find it. Or is this the normal behavior of the fileinput.input () function?
- It seems that the file is first fully readable before any iteration begins. If so, then for a file, a very large file size may exceed the RAM capacity. I am not sure about this: I tried to test a file with a size of 1.5 GB, but it is quite long, and now I threw this point. If this point is correct, it represents an argument for using another solution with enumeration ()
.
Exemple:
text = '''Harold Acton (1904β1994) Gilbert Adair (born 1944) Helen Adam (1909β1993) Arthur Henry Adams (1872β1936) Robert Adamson (1852β1902) Fleur Adcock (born 1934) Joseph Addison (1672β1719) Mark Akenside (1721β1770) James Alexander Allan (1889β1956) Leslie Holdsworthy Allen (1879β1964) William Allingham (1824/28-1889) Kingsley Amis (1922β1995) Ethel Anderson (1883β1958) Bruce Andrews (born 1948) Maya Angelou (born 1928) Rae Armantrout (born 1947) Simon Armitage (born 1963) Matthew Arnold (1822β1888) John Ashbery (born 1927) Thomas Ashe (1836β1889) Thea Astley (1925β2004) Edwin Atherstone (1788β1872)''' #with open('alao.txt','rb') as f: f = text.splitlines(True) # argument True in splitlines() makes the newlines kept GIVE_NO_FOR_END = {} end = 0 for i,line in enumerate(f): end += len(line) GIVE_NO_FOR_END[end] = i if line[-1]=='\n': GIVE_NO_FOR_END[end+1] = i+1 end_positions = GIVE_NO_FOR_END.keys() end_positions.sort() print '\n'.join('line %-3s ending at position %s' % (str(GIVE_NO_FOR_END[end]),str(end)) for end in end_positions) def Which_Line_for_Position(pos, dic = GIVE_NO_FOR_END, keys = end_positions, kmax = end_positions[-1]): return dic[(k for k in keys if pos < k).next()] if pos<kmax else None print for x in (2,450,320,104,105,599,600): print 'pos=%-6s line %s' % (x,Which_Line_for_Position(x))
result
line 0 ending at position 25 line 1 ending at position 51 line 2 ending at position 74 line 3 ending at position 105 line 4 ending at position 132 line 5 ending at position 157 line 6 ending at position 184 line 7 ending at position 210 line 8 ending at position 244 line 9 ending at position 281 line 10 ending at position 314 line 11 ending at position 340 line 12 ending at position 367 line 13 ending at position 393 line 14 ending at position 418 line 15 ending at position 445 line 16 ending at position 472 line 17 ending at position 499 line 18 ending at position 524 line 19 ending at position 548 line 20 ending at position 572 line 21 ending at position 600 pos=2 line 0 pos=450 line 16 pos=320 line 11 pos=104 line 3 pos=105 line 4 pos=599 line 21 pos=600 line None
.
Then, having the function Which_Line_for_Position () , it is easy to get the number of the current line: just pass f.tell () as an argument to the function
But WARNING : when using f.tell () and performing file pointer movements in a file, it is absolutely essential that the file is opened in binary mode: 'rb' or 'rb +' or 'ab' or ....