I have a problem when my current algorithm uses a naive linear search algorithm to extract data from multiple data files via the corresponding lines.
This is something like this (pseudo code):
while count < total number of files open current file extract line from this file build an arrayofStrings from this line foreach string in arrayofStrings foreach file in arrayofDataReferenceFiles search in these files close file increment count
For a big real life, the process can take about 6 hours.
Basically, I have a large set of strings that the program uses to search the same set of files (for example, 10 in 1 instance and maybe 3 in the next instance of the program). Since the referenced data files are subject to change, I donβt think it is wise to create a constant index of these files.
I'm almost new and don't know about faster methods for unsorted data.
I thought, since the search becomes repeated after some time, is it possible to pre-create the location index of certain lines in the data data files without using any external perl libraries after creating the file array (files are known)? This script will be ported to a server on which perhaps only standard Perl is installed.
I figured it would take 3-5 minutes to create a kind of index for the search before processing the work.
Is there a specific indexing / search concept that applies to my situation?
Thanks everyone!
source share