Is there an easy way to find out which line number the file pointer is on?

In Python 2.5, I read a structured text data file (~ 30 MB in size) using a file pointer:

fp = open('myfile.txt', 'r') line = fp.readline() # ... many other fp.readline() processing steps, which # are used in different contexts to read the structures 

But then, while analyzing the file, I hit something interesting that I want to report the line number, so I can examine the file in a text editor. I can use fp.tell() to indicate where the byte offset is (for example, 16548974L ), but there is no "fp.tell_line_number ()" to help me translate this to a line number.

Is there a built-in or a Python extension for easy tracking and "telling" which line number the pointer to the text file is included?

Note. I am not asking to use the line_number += 1 style counter, since I call fp.readline() in different contexts, and this approach requires more debugging than inserting the counter in the right corners of the code.

+6
source share
8 answers

A typical solution to this problem is to define a new class that wraps an existing instance of file that automatically counts numbers. Something like this (just from the head, I did not check this):

 class FileLineWrapper(object): def __init__(self, f): self.f = f self.line = 0 def close(self): return self.f.close() def readline(self): self.line += 1 return self.f.readline() # to allow using in 'with' statements def __enter__(self): return self def __exit__(self, exc_type, exc_val, exc_tb): self.close() 

Use it as follows:

 f = FileLineWrapper(open("myfile.txt", "r")) f.readline() print(f.line) 

It looks like the standard fileinput module does the same (and some other things); You can use this if you want.

+13
source

You can find a useful fileinput . It provides a generic interface for iterating over an arbitrary number of files. Some important points from the docs:

fileinput.lineno()

Returns the cumulative line number of the line just read. Before reading the first line, 0 is returned. After the last line of the last file has been read, the line number of this line is returned.

fileinput.filelineno()

Returns the line number in the current file. Before reading the first line, 0 is returned. After the last line of the last file has been read, returns the line number of this line in the file.

+10
source

The following code will print the line number (where the pointer is currently located) when passing through a file ('testfile')

 file=open("testfile", "r") for line_no, line in enumerate(file): print line_no # The content of the line is in variable 'line' file.close() 

exit:

 1 2 3 ... 
+7
source

I don’t think so, not the way you want (as in the standard built-in function of Python files returned by open ).

If you cannot track the line number manually when you read the lines or use a wrapper class (by the way, the excellent suggestions of GregH and senderle), I think you just need to use fp.tell() and go back to the beginning of the file until you get to him.

This is not so bad, because I assume that the error conditions will be less likely than anything that works smoothly. If all goes well, there is no effect.

If there is an error, you have additional efforts to rescan the file. If the file is large, it may affect your perceived performance - you should take this into account if this is a problem.

+1
source

One way could be to iterate over a line and save an explicit count of the number of already visible lines:

 >>> f=open('text.txt','r') >>> from itertools import izip >>> from itertools import count >>> f=open('test.java','r') >>> for line_no,line in izip(count(),f): ... print line_no,line 
0
source

The following code creates the function What_Line_for_Position (pos) , which gives the line number for the position pos , that is, the number of the line that contains the character located at the pos position in the file.

This function can be used with any position as an argument, regardless of the value of the current position of the file pointer and from the history of movements of this pointer before calling the function.

Thus, using this function, you cannot limit the number of the current line only during continuous iteration on lines, as is the case with Greg Huguill's decision.

 with open(filepath,'rb') as f: GIVE_NO_FOR_END = {} end = 0 for i,line in enumerate(f): end += len(line) GIVE_NO_FOR_END[end] = i if line[-1]=='\n': GIVE_NO_FOR_END[end+1] = i+1 end_positions = GIVE_NO_FOR_END.keys() end_positions.sort() def Which_Line_for_Position(pos, dic = GIVE_NO_FOR_END, keys = end_positions, kmax = end_positions[-1]): return dic[(k for k in keys if pos < k).next()] if pos<kmax else None 

.

The same solution can be written using the fileinput module:

 import fileinput GIVE_NO_FOR_END = {} end = 0 for line in fileinput.input(filepath,'rb'): end += len(line) GIVE_NO_FOR_END[end] = fileinput.filelineno() if line[-1]=='\n': GIVE_NO_FOR_END[end+1] = fileinput.filelineno()+1 fileinput.close() end_positions = GIVE_NO_FOR_END.keys() end_positions.sort() def Which_Line_for_Position(pos, dic = GIVE_NO_FOR_END, keys = end_positions, kmax = end_positions[-1]): return dic[(k for k in keys if pos < k).next()] if pos<kmax else None 

But this solution has some disadvantages:

  • he needs to import the fileinput module
  • it deletes the entire contents of the file !! There should be something wrong in my code, but I don't know fileinput to find it. Or is this the normal behavior of the fileinput.input () function?
  • It seems that the file is first fully readable before any iteration begins. If so, then for a file, a very large file size may exceed the RAM capacity. I am not sure about this: I tried to test a file with a size of 1.5 GB, but it is quite long, and now I threw this point. If this point is correct, it represents an argument for using another solution with enumeration ()

.

Exemple:

 text = '''Harold Acton (1904–1994) Gilbert Adair (born 1944) Helen Adam (1909–1993) Arthur Henry Adams (1872–1936) Robert Adamson (1852–1902) Fleur Adcock (born 1934) Joseph Addison (1672–1719) Mark Akenside (1721–1770) James Alexander Allan (1889–1956) Leslie Holdsworthy Allen (1879–1964) William Allingham (1824/28-1889) Kingsley Amis (1922–1995) Ethel Anderson (1883–1958) Bruce Andrews (born 1948) Maya Angelou (born 1928) Rae Armantrout (born 1947) Simon Armitage (born 1963) Matthew Arnold (1822–1888) John Ashbery (born 1927) Thomas Ashe (1836–1889) Thea Astley (1925–2004) Edwin Atherstone (1788–1872)''' #with open('alao.txt','rb') as f: f = text.splitlines(True) # argument True in splitlines() makes the newlines kept GIVE_NO_FOR_END = {} end = 0 for i,line in enumerate(f): end += len(line) GIVE_NO_FOR_END[end] = i if line[-1]=='\n': GIVE_NO_FOR_END[end+1] = i+1 end_positions = GIVE_NO_FOR_END.keys() end_positions.sort() print '\n'.join('line %-3s ending at position %s' % (str(GIVE_NO_FOR_END[end]),str(end)) for end in end_positions) def Which_Line_for_Position(pos, dic = GIVE_NO_FOR_END, keys = end_positions, kmax = end_positions[-1]): return dic[(k for k in keys if pos < k).next()] if pos<kmax else None print for x in (2,450,320,104,105,599,600): print 'pos=%-6s line %s' % (x,Which_Line_for_Position(x)) 

result

 line 0 ending at position 25 line 1 ending at position 51 line 2 ending at position 74 line 3 ending at position 105 line 4 ending at position 132 line 5 ending at position 157 line 6 ending at position 184 line 7 ending at position 210 line 8 ending at position 244 line 9 ending at position 281 line 10 ending at position 314 line 11 ending at position 340 line 12 ending at position 367 line 13 ending at position 393 line 14 ending at position 418 line 15 ending at position 445 line 16 ending at position 472 line 17 ending at position 499 line 18 ending at position 524 line 19 ending at position 548 line 20 ending at position 572 line 21 ending at position 600 pos=2 line 0 pos=450 line 16 pos=320 line 11 pos=104 line 3 pos=105 line 4 pos=599 line 21 pos=600 line None 

.

Then, having the function Which_Line_for_Position () , it is easy to get the number of the current line: just pass f.tell () as an argument to the function

But WARNING : when using f.tell () and performing file pointer movements in a file, it is absolutely essential that the file is opened in binary mode: 'rb' or 'rb +' or 'ab' or ....

0
source

Recently, there was a problem with a similar problem and came up with a solution based on the class.

 class TextFileProcessor(object): def __init__(self, path_to_file): self.print_line_mod_number = 0 self.__path_to_file = path_to_file self.__line_number = 0 def __printLineNumberMod(self): if self.print_line_mod_number != 0: if self.__line_number % self.print_line_mod_number == 0: print(self.__line_number) def processFile(self): with open(self.__path_to_file, 'r', encoding='utf-8') as text_file: for self.__line_number, line in enumerate(text_file, start=1): self.__printLineNumberMod() # do some stuff with line here. 

Set the print_line_mod_number property print_line_mod_number cadence that you want to register, and then call processFile .

For example ... if you need feedback every 100 lines, it will look like this.

 tfp = TextFileProcessor('C:\\myfile.txt') tfp.print_line_mod_number = 100 tfp.processFile() 

Console output will be

 100 200 300 400 etc... 
0
source

As for the solution by @eyquem , I suggest using mode='r' with the fileinput and fileinput.lineno() module, and it worked for me.

This is how I implement these options in my code.

  table=fileinput.input('largefile.txt',mode="r") if fileinput.lineno() >= stop : # you can disregard the IF condition but I am posting to illustrate the approach from my code. temp_out.close() 
-1
source

Source: https://habr.com/ru/post/890651/


All Articles