I write code that analyzes log files, with the caveat that these files are compressed and should be uncompressed on the fly. This code is a somewhat highly sensitive piece of code, so I am trying to use various methods to find the right one. I essentially have as much RAM as the program needs, no matter how many threads I use.
I found a method that seems to work quite well, and I'm trying to figure out why it offers the best performance.
Both methods have a reader stream that reads from the gzip process in the channel and writes to a large buffer. This buffer is then lazily parsed when the next line of the log is requested, returning what is essentially a structure of pointers to where the different fields are in the buffer.
The code is in D, but it is very similar to C or C ++.
General variable:
shared(bool) _stream_empty = false;; shared(ulong) upper_bound = 0; shared(ulong) curr_index = 0;
Code Analysis:
//Lazily parse the buffer void construct_next_elem() { while(1) { // Spin to stop us from getting ahead of the reader thread buffer_empty = curr_index >= upper_bound -1 && _stream_empty; if(curr_index >= upper_bound && !_stream_empty) { continue; } // Parsing logic ..... } }
Method 1: Malloc - a buffer large enough to store the unpacked file.
char[] buffer; // Same as vector<char> in C++ buffer.length = buffer_length; // Same as vector reserve in C++ or malloc
Method 2: Use an anonymous memory card as a buffer
MmFile buffer; buffer = new MmFile(null, MmFile.Mode.readWrite, // PROT_READ || PROT_WRITE buffer_length, null); // MAP_ANON || MAP_PRIVATE
Reading:
ulong buffer_length = get_gzip_length(file_path); pipe = pipeProcess(["gunzip", "-c", file_path], Redirect.stdout); stream = pipe.stdout(); static void stream_data() { while(!l.stream.eof()) {
I get significantly better performance from method 1, even an order of magnitude
User time (seconds): 112.22 System time (seconds): 38.56 Percent of CPU this job got: 151% Elapsed (wall clock) time (h:mm:ss or m:ss): 1:39.40 Average shared text size (kbytes): 0 Average unshared data size (kbytes): 0 Average stack size (kbytes): 0 Average total size (kbytes): 0 Maximum resident set size (kbytes): 3784992 Average resident set size (kbytes): 0 Major (requiring I/O) page faults: 0 Minor (reclaiming a frame) page faults: 5463 Voluntary context switches: 90707 Involuntary context switches: 2838 Swaps: 0 File system inputs: 0 File system outputs: 0 Socket messages sent: 0 Socket messages received: 0 Signals delivered: 0 Page size (bytes): 4096 Exit status: 0
vs.
User time (seconds): 275.92 System time (seconds): 73.92 Percent of CPU this job got: 117% Elapsed (wall clock) time (h:mm:ss or m:ss): 4:58.73 Average shared text size (kbytes): 0 Average unshared data size (kbytes): 0 Average stack size (kbytes): 0 Average total size (kbytes): 0 Maximum resident set size (kbytes): 3777336 Average resident set size (kbytes): 0 Major (requiring I/O) page faults: 0 Minor (reclaiming a frame) page faults: 944779 Voluntary context switches: 89305 Involuntary context switches: 9836 Swaps: 0 File system inputs: 0 File system outputs: 0 Socket messages sent: 0 Socket messages received: 0 Signals delivered: 0 Page size (bytes): 4096 Exit status: 0
Getting additional page errors using method 2.
Can someone help me shed some light on why such a dramatic decrease in performance when using mmap?
If anyone knows of any better solutions to this problem, I would love to hear that.
EDIT -----
Method 2 changed:
char * buffer = cast(char*)mmap(cast(void*)null, buffer_length, PROT_READ | PROT_WRITE, MAP_ANON | MAP_PRIVATE, -1, 0);
now gains a 3x performance boost through the use of a simple MmFile. I'm trying to figure out what could lead to such a dramatic difference in performance that it is essentially just a wrapper around mmap.
Perf numbers for easy use of direct char * mmap vs Mmfile, the path is less than page errors:
User time (seconds): 109.99 System time (seconds): 36.11 Percent of CPU this job got: 151% Elapsed (wall clock) time (h:mm:ss or m:ss): 1:36.20 Average shared text size (kbytes): 0 Average unshared data size (kbytes): 0 Average stack size (kbytes): 0 Average total size (kbytes): 0 Maximum resident set size (kbytes): 3777896 Average resident set size (kbytes): 0 Major (requiring I/O) page faults: 0 Minor (reclaiming a frame) page faults: 2771 Voluntary context switches: 90827 Involuntary context switches: 2999 Swaps: 0 File system inputs: 0 File system outputs: 0 Socket messages sent: 0 Socket messages received: 0 Signals delivered: 0 Page size (bytes): 4096 Exit status: 0