I have 2 files:
hyp.txt
It is a guide to action which ensures that the military always obeys the commands of the party he read the book because he was interested in world history
ref.txt
It is a guide to action that ensures that the military will forever heed Party commands he was interested in world history because he read the book
And I have a function that does some calculations to compare lines of text, for example. line 1 of the hyp.txt file with line 1 of the ref.txt file.
def scorer(list_of_tokenized_hyp, list_of_tokenized_ref): """ :type list_of_tokenized_hyp: iter(iter(str)) :type list_of_tokenized_ref: iter(iter(str)) """ for hypline, refline in zip(list_of_tokenized_hyp, list_of_tokenized_ref):
And this function cannot be changed. However, I can manipulate what I pass to the functions. Therefore, I am currently loading a file into a function as follows:
with open('hyp.txt', 'r') as hypfin, open('ref.txt', 'r') as reffin: hyp = [line.split() for line in hypfin] ref = [line.split() for line in reffin] scorer(hypfin, reffin)
But in doing so, I loaded the entire file and the separation line into memory before loading it into scorer() .
Know that scorer() processes files line by line, is there a way to not materialize the shared line before submitting it to the function without changing the function scorer() ?
Is there a way to feed into some kind of generator instead?
I tried this:
with open('hyp.txt', 'r') as hypfin, open('ref1.txt', 'r') as ref1fin, open('ref2.txt', 'r') as ref2fin: hyp = (h.split() for h in hypline) ref = (r.split() for r in hypline) scorer(hypfin, reffin)
but I'm not sure if h.split() materialized. If it was materialized, why? If not, why?
If I could change the scorer() function, then I could easily add this line after for :
def scorer(list_of_tokenized_hyp, list_of_tokenized_ref): for hypline, refline in zip(list_of_tokenized_hyp, list_of_tokenized_ref): hypline = hypline.split() refline = refline.split()
But this is not possible in my case, since I cannot change this function.