I need to process 2 million text files and create triples there.
Suppose I have a txt file xyz.txt (one of the files with 2 million input), it is processed as shown below:
start(xyz.txt)---->module1(xyz.tpd)------>module2(xyz.adv)-------->module3(xyz.tpl)
offer me the logic or concept so that I can faster and optimize the process on x64 Windows 4GB systems.
module1 (working): it parses the txt file using the .bat file in which the parser is called, this is a separate system thread, and after 15 seconds it starts parsing another txt file again, etc ...
module2 (working): it takes a .tpd file as input and creates a .adv file. module3 (working): it takes the .adv file as input and generates .tpl (triples).
Should I start streams from txt files or at some other point ?? I am afraid that if I get hung up on switching contexts.
Does anyone have a better logic so that I can try .....?
source share