Python Reading and Writing to Files: Writing () vs writelines ()

I have a pretty thought up (albeit realistic) example with (for me) some unexpected results.

Basically, I want to read a large file and write to another file all the lines containing the specified line.

For instance:

String to match:

12348:

File contents:

12348: 12345
zxcv
xcvb
dfgh
tyu
12348: 123456

Write to the file:

12348: 12345
12348: 123456

I implemented this in several ways:

Method 1: write () line by line

def compute_write():
    start = datetime.now()
    with open("read.txt") as fh:
        with open("write.txt", "wb") as fh2:
            for line in fh:
                if "12348:" in line:
                    fh2.write(line)
    end = datetime.now()
    elapsed = end - start
    print("compute_write {0}".format(elapsed))

Method 2: writelines () with an array

def compute_array():
    ret = []
    start = datetime.now()
    with open("read.txt") as fh:
        for line in fh:
            if "12348:" in line:
                ret.append(line)

    with open("write.txt", "wb") as fh2:
        fh2.writelines(ret)

    elapsed = datetime.now() - start
    print("compute_array {0}".format(elapsed))

Method 3: writelines () with generator function

def generator_fn(fh):
    for line in fh:
        if "12348:" in line:
            yield line

def compute_gen():
    start = datetime.now()
    with open("read.txt", "r") as fh:
        with open("write.txt", "wb") as fh2:
            fh2.writelines(generator_fn(fh))
    elapsed = datetime.now() - start
    print("compute_gen {0}".format(elapsed))

Now, to make it super realistic, I repeat each calculation 20 times and calculate the total time it takes. Results: (read.txt ~ 700 MB and writes ~ 130 MB for write.txt)

`compute_write ()` -> 16.159134s
`compute_array ()` -> 12.453268s
`compute_gen()`   --> 15.374717s

, compute_array() ~ 25% , ? /?

+4

:

5116
, ?
4473
Python
3790
?
3474
?
2028
?

Source: https://habr.com/ru/post/1675009/


All Articles