How to split a file larger than memory?

Question

How to split a file larger than memory?

Let's say I only have 8G of heap space, and I would like to split a file that is larger than a few small files. If i try

with open(fname) as f:
    content = f.readlines()

I will run out of memory because it is trying to download the whole file. Is there a way to open a file without loading the whole object in memory and just take lines X through Y?

+4

python io

amphibient Apr 10 '17 at 15:04

source share

2 answers

. , , . itertools.islice(), .

from itertools import islice

line_slice = (10, 20)
with open(fname) as f:
    content = islice(f, *line_slice)

f.readlines()[10:20].

, islice() - . , writelines() , . , , writelines().

with open(out_fname, 'w') as f:
    f.writelines(content)

+2

Martin Valgur 10 . '17 15:08

tdelaney · Accepted Answer · 2017-04-10T16:44:51+0000

itertools.islice - , , . , islice(f, 10, 20) 10 , , . , , , .

, . fileobj.writelines(isslice(f, 10)) 0 . , , , .

100 , 10 apeice...., , 8gig.

import itertools
import os

lines_per_file = 10

with open('big.txt') as infp:
    # file counter used to create unique output files
    for file_count in itertools.count(1):
        out_filename = 'out-{}.txt'.format(file_count)
        with open(out_filename, 'w') as outfp:
            # write configured number of lines to file
            outfp.writelines(itertools.islice(infp, lines_per_file))
        # break when no extra data written
        if os.stat(out_filename).st_size == 0:
            os.remove(out_filename)
            break

How to split a file larger than memory?

More articles: