How to read a file in reverse order in python3.2 without reading the entire file into memory?

I am parsing log files from 1 to 10 GB in size using python3.2, you need to search for a string with a certain regular expression (some timestamp), and I want to find the last event.

I tried using:

for line in reversed(list(open("filename")))

resulting in very poor performance (in good cases) and MemoryError in bad cases.

In the stream: Read the file in reverse with python , I did not find a good answer.

I found the following solution: python head, tail and backward read by lines of a text file are very promising, however for python3.2 error it does not work:

NameError: name 'file' is not defined

I later tried replacing File(file)with File(TextIOWrapper), since this function of the built-in object open()returns, however, this led to a few more errors (I can clarify if someone tells me that this is the right way :))

+4
source share
2 answers

This is the function that does what you are looking for.

def reverse_lines(filename, BUFSIZE=4096):
    f = open(filename, "rb")
    f.seek(0, 2)
    p = f.tell()
    remainder = ""
    while True:
        sz = min(BUFSIZE, p)
        p -= sz
        f.seek(p)
        buf = f.read(sz) + remainder
        if '\n' not in buf:
            remainder = buf
        else:
            i = buf.index('\n')
            for L in buf[i+1:].split("\n")[::-1]:
                yield L
            remainder = buf[:i]
        if p == 0:
            break
    yield remainder

it works by reading the buffer from the end of the file (4kb by default) and generating all the lines in it in reverse order. Then it returns to 4k and does the same until the beginning of the file. The code may need to save more than 4k in memory if there is no line feed in the section being processed (very long lines).

You can use the code as

for L in reverse_lines("my_big_file"):
   ... process L ...
+2
source

, seek. :

 $ cat words.txt 
foo
bar
baz
[6] oz123b@debian:~ $ ls -l words.txt 
-rw-r--r-- 1 oz123 oz123 12 Mar  9 19:38 words.txt

12 . , 8 :

In [3]: w=open("words.txt")
In [4]: w.seek(8)
In [5]: w.readline()
Out[5]: 'baz\n'

, :

 w=open('words.txt')

In [6]: for s in [8, 4, 0]:
   ...:     _= w.seek(s)
   ...:     print(w.readline().strip())
   ...:     
baz
bar
foo

. , .

+2

Source: https://habr.com/ru/post/1530861/


All Articles