I have a pretty thought up (albeit realistic) example with (for me) some unexpected results.
Basically, I want to read a large file and write to another file all the lines containing the specified line.
For instance:
String to match:
12348:
File contents:
12348: 12345
zxcv
xcvb
dfgh
tyu
12348: 123456
Write to the file:
12348: 12345
12348: 123456
I implemented this in several ways:
Method 1: write () line by line
def compute_write():
start = datetime.now()
with open("read.txt") as fh:
with open("write.txt", "wb") as fh2:
for line in fh:
if "12348:" in line:
fh2.write(line)
end = datetime.now()
elapsed = end - start
print("compute_write {0}".format(elapsed))
Method 2: writelines () with an array
def compute_array():
ret = []
start = datetime.now()
with open("read.txt") as fh:
for line in fh:
if "12348:" in line:
ret.append(line)
with open("write.txt", "wb") as fh2:
fh2.writelines(ret)
elapsed = datetime.now() - start
print("compute_array {0}".format(elapsed))
Method 3: writelines () with generator function
def generator_fn(fh):
for line in fh:
if "12348:" in line:
yield line
def compute_gen():
start = datetime.now()
with open("read.txt", "r") as fh:
with open("write.txt", "wb") as fh2:
fh2.writelines(generator_fn(fh))
elapsed = datetime.now() - start
print("compute_gen {0}".format(elapsed))
Now, to make it super realistic, I repeat each calculation 20 times and calculate the total time it takes. Results: (read.txt ~ 700 MB and writes ~ 130 MB for write.txt)
`compute_write ()` -> 16.159134s
`compute_array ()` -> 12.453268s
`compute_gen()` --> 15.374717s
, compute_array() ~ 25% , ? /?