I have a performance problem due to which I cannot lower my head. I am writing a Java application that parses huge (> 20 million lines) text files and stores certain information in a set. I measure performance in seconds per million rows. Since I need a lot of memory, I usually run the program with -Xmx6000m and -Xms4000m.
If I just run the program, it analyzes 1 million lines in 6 seconds. However, after some performance checks, I realized that if I add this code before the actual parsing procedure, the performance increases to less than 3 seconds per 1 million lines:
BufferedReader br = new BufferedReader(new FileReader("graphs.nt")); HashMap<String, String> foo = new HashMap<String, String>(); String line; while ((line = br.readLine()) != null){ foo.put(line, "foo"); } foo = null; br.close(); br = null;
The graphs.nt file is about 9 million lines long. The performance improvement is maintained even if I do not set foo to null, this basically demonstrates that the card is not actually used by the program.
The rest of the code is completely unrelated. I use the parser from openrdf sesame to read another (not the graphs.nt file) and save the extracted information in a new HashSet created by another object. In the rest of the code, I create a Parser object to which I pass the Handler Object .
It really bothers me. I guess this somehow makes the JVM allocate more memory for my program, and I can see the prompts when I run the top. Without a HashMap, it allocates about 1 GB of memory. If I initialize the HashMap, it will highlight> 2 Gigs.
My question is: if that sounds perfectly reasonable. Is it possible that creating such a large object will allocate more memory for subsequent program operation? Should -Xmx and -Xms control the allocation of memory, or are there additional arguments that might play a role here?
I know this may seem like a strange question and there is not enough information, but this is all the information that I found related to this problem. If there is additional information that may be useful, I am more than happy to provide it.