I have several event log files (one event per line). Logs may overlap. Logs are created on separate client machines from perhaps several time zones (but I assume I know the time zone). Each event has a time stamp that was normalized to normal time (by instantiating each instance of the log parser calendar with the time zone corresponding to the log file, and then using getTimeInMillis to get the UTC time). Logs are already sorted by timestamp. Several events can occur simultaneously, but they are by no means equal.
These files can be relatively large, for example, 500,000 events or more in one log, so reading the entire contents of the logs into a simple Event [] is not possible.
What I'm trying to do is combine events from each journal into one journal. This is similar to the mergesort task, but each log is already sorted, I just need to put them together. The second component is that the same event can be witnessed in each individual log file, and I want to “delete repeating events” in the file output log.
Is it possible to do this “in place”, as in, to work consistently on some small buffers of each log file? I can’t just read in all the files in Event [], sort the list and delete duplicates, but so far my limited programming capabilities allow me to see this as a solution. Is there an even more complicated approach that I can use for this when I read events from each of the magazines at the same time?
Josh source share