First of all, I note that you can sort a file without saving all its memory, and that most operating systems have a program that does this, which is often called simply βsortingβ. You can usually get it to sort by field inside the file, but if not, you can rewrite each line to sort it as you wish.
Given this, you can connect two files by sorting them so that the first is sorted in field # 1, and the second in field # 2. Then you can create one record for each match, combining all the fields and storing a piece of each file in memory , where all the fields you sorted have the same value. This will allow you to link the result to another file - four such connections should solve your problem.
Depending on your data, the time required to resolve your problem may depend on the order in which you make connections. One naive way to use this at every step is to take a small random sample from each file and use it to find out how many results will result from each possible connection, and select the connection that gives the least amount of results. One way to take a random sample of N elements from a large file is to take the first N lines in the file, and then, when you have read m lines so far, read the next line and then exchange with probability N / (m + 1) one out of N rows held for her, otherwise throw it away. Continue until you read the entire file.
source share