Combine sorting in python

Basically, I have a bunch of files containing domains. I sorted every single file based on its TLD using .sort (key = func_that_returns_tld)

Now that I have done this, I want to merge all the files and end up with one massive sorted file. I assume I need something like this:

open all files
read one line from each file into a list
sort list with .sort(key=func_that_returns_tld)
output that list to file
loop by reading next line

I think about this right? Any advice on how to do this would be appreciated.

+1
source share
3 answers

If your files are not very large, just read them all in memory (as S. Lott suggests). It will definitely be easier.

, "" . , , , , heapq.merge. , , , .

import heapq
import contextlib

class Domain(object):
    def __init__(self,domain):
        self.domain=domain
    @property
    def tld(self):
        # Put your function for calculating TLD here
        return self.domain.split('.',1)[0]
    def __lt__(self,other):
        return self.tld<=other.tld
    def __str__(self):
        return self.domain

class DomFile(file):
    def next(self):
        return Domain(file.next(self).strip())

filenames=('data1.txt','data2.txt')
with contextlib.nested(*(DomFile(filename,'r') for filename in filenames)) as fhs:
    for elt in heapq.merge(*fhs):
        print(elt)

data1.txt:

google.com
stackoverflow.com
yahoo.com

data2.txt:

standards.freedesktop.org
www.imagemagick.org

:

google.com
stackoverflow.com
standards.freedesktop.org
www.imagemagick.org
yahoo.com
+8

, .

. , . ", " .

, .

all_data= []
for f in list_of_files:
    with open(f,'r') as source:
        all_data.extend( source.readlines() )
all_data.sort(... whatever your keys are... )

. all_data .

0

( , ), SQLite3 .

0

Source: https://habr.com/ru/post/1761408/


All Articles