Does OutputStream.write (buf, offset, size) have a memory leak in Linux?

I am writing a piece of Java code to create 500K small files (on average 40K each) on CentOS. The source code is as follows:

package MyTest; import java.io.*; public class SimpleWriter { public static void main(String[] args) { String dir = args[0]; int fileCount = Integer.parseInt(args[1]); String content="@#$% SDBSDGSDF ASGSDFFSAGDHFSDSAWE^@$^HNFSGQW%#@&$%^J#%@#^$#UHRGSDSDNDFE$T#@ $UERDFASGWQR!@ % !@ ^$#@YEGEQW% !@ %!!GSDHWET!^"; StringBuilder sb = new StringBuilder(); int count = 40 * 1024 / content.length(); int remainder = (40 * 1024) % content.length(); for (int i=0; i < count; i++) { sb.append(content); } if (remainder > 0) { sb.append(content.substring(0, remainder)); } byte[] buf = sb.toString().getBytes(); for (int j=0; j < fileCount; j++) { String path = String.format("%s%sTestFile_%d.txt", dir, File.separator, j); try{ BufferedOutputStream fs = new BufferedOutputStream(new FileOutputStream(path)); fs.write(buf); fs.close(); } catch(FileNotFoundException fe) { System.out.printf("Hit filenot found exception %s", fe.getMessage()); } catch(IOException ie) { System.out.printf("Hit IO exception %s", ie.getMessage()); } } } } 

You can run this by running the following command: java -jar SimpleWriter.jar my_test_dir 500000

I thought this was simple code, but then I understand that this code uses up to 14 GB of memory. I know, because when I use free -m to check the memory, free memory all the time crashes until my 15 GB virtual memory has only 70 MB of free memory left. I compiled this with Eclipse and I will compile it against JDK 1.6 and then JDK1.7. The result is the same. The funny thing is that if I comment on fs.write (), just open and close the stream, the memory will stabilize at some point. After I put fs.write () back, the memory allocation is just wild. 500K 40KB files about 20G. It seems like a Java stream writer never frees its buffer during an operation.

I once thought that java GC does not have time to clean. But that doesn't make sense, since I closed the file stream for each file. I even port my code to C # and work under windows, the same code producing 500K 40KB files with stable memory at a certain point, not accepting 14G under CentOS. At least C #'s behavior is what I expected, but I couldn’t believe that Java follows this path. I asked my colleague who was experienced in java. They did not see anything bad in the code, but could not explain why this happened. And they admit that no one tried to create a 500K file in a loop without stopping.

I also searched online, and everyone says that you need to pay attention only to the fact that it is close to the stream that I made.

Can someone help me figure out what happened?

Can someone try and tell me what you see?

By the way, some people in this community have tried the code on Windows, and it seems to work fine. I have not tried this on windows. I only tried on Linux as I thought where people use Java. So it looks like this problem arose on Linux).

I also did the following to restrict the JVM heap, but has no effect java -Xmx2048m -jar SimpleWriter.jar my_test_dir 500000

+4
source share
3 answers

I tried to test your program in Win XP, JDK 1.7.25. Immediately it turned out OutOfMemoryExceptions.

During debugging, a total of 3000 count (args [1]), the count variable from this code:

  int count = 40 * 1024 * 1024 / content.length(); int remainder = (40 * 1024 * 1024) % content.length(); for (int i = 0; i < count; i++) { sb.append(content); } 

count - 355449. Thus, the String you are trying to create will be 355449 * in length, or, as you calculated, 40 MB. I was not in memory when I was 266587, and sb was 31457266 long. At this moment, every file I get is 30 MB.

The problem does not arise with memory or GC, but with how you insert the line.

Were you created the files or had memory before any file was created?

I think your main problem is the line:

  int count = 40 * 1024 * 1024 / content.length(); 

it should be:

  int count = 40 * 1024 / content.length(); 

to create 40K, not 40Mb files.

+1
source

[ Edit2: The original answer is left in italics at the end of this post]

After your explanation in the comments, I ran your code on a Windows machine (Java 1.6), and here are my conclusions (the numbers are taken from VisualVM, OS memory, as can be seen from the task manager):

  • Example with a size of 40K, writing to 500K files (no parameters for JVM): Heap used: ~ 4M, Total Heap: 16M, OS memory: ~ 16M

  • Example with a size of 40 M, writing to 500 files (parameters for JVM -Xms128m -Xmx512m. Without parameters I get an OutOfMemory error when creating a StringBuilder): Heap used: ~ 265M, heap size: ~ 365M, OS memory: ~ 365M

Especially from the second example, you can see that my initial explanation is still worth it. Yes, someone would expect that most of the memory would be freed, since the byte[] from the BufferedOutputStream would be in the first generation space (objects with a short circuit), but this a) will not happen immediately and b) when the GC decides to hit ( this is actually in my case), yes, he will try to clear the memory, but he can clear as much memory as he sees fit, and not necessarily all of that. GC does not provide any guarantors you can count on.

So, in general, you should give the JVM as much memory as is comfortable with you. If you need to keep low memory for special functions, you should try the strategy, as an example of the code that I gave below in my original answer, i.e. Just don't create all those byte[] objects.

Now in your case with CentOS, it looks like the JVM is behaving strangely. Perhaps we could talk about buggy or poor implementation. To classify it as a leak / error, you should try using -Xmx to limit the heap. Also try that Peter Laurie suggested not creating a BufferedOutputStream at all (in the case of a small file), since you just write all the bytes at once.

If it still exceeds the memory limit, you are experiencing a leak and probably should find an error. (You can still complain, and they can optimize it in the future).


[Edit1: the answer below assumes that the OP code performed as many read operations as many write operations, so memory usage was warranted. OP explained that this is not so, therefore, did not answer his question

"... my virtual memory is 15 GB ..." If you give the JVM so much memory, why should it try to start the GC? As for the JVM, it is allowed to receive so much memory from the system and run the GC only when it believes that it is suitable. Each execution of a BufferedOutputStream by default allocate an 8 KB buffer. The JVM will try to return this memory only when necessary. This is the expected behavior. Do not confuse the memory that you see is free from the system point of view and from the point of view of the JVM. As far as the system is concerned, memory is allocated and will be released when the JVM shuts down. As for the JVM, all byte[] arrays allocated from the BufferedOutputStream are no longer used, this is "free" memory and will be fixed if necessary. If for some reason you do not want this behavior, you can try the following: Extend the BufferedOutputStream class (for example, create the ReusableBufferedOutputStream class) and add a new method, for example. reUseWithStream(OutputStream os) . Then this method will clear the internal byte[] , close and close the previous stream, reset any variables used and set a new stream. Then your code will be as follows:

 // intialize once ReusableBufferedOutputStream fs = new ReusableBufferedOutputStream(); for (int i=0; i < fileCount; i ++) { String path = String.format("%s%sTestFile_%d.txt", dir, File.separator, i); //set the new stream to be buffered and read fs.reUseWithStream(new FileOutputStream(path)); fs.write(this._buf, 0, this._buf.length); // this._buf was allocated once, 40K long contain text } fs.close(); // Close the stream after we are done 

Using the above approach, you will avoid creating many byte[] . However, I do not see any problems with the expected behavior, and you do not mention any problem other than "I see that it takes too much memory." You used it anyway to use it.]

0
source

I think this is due to the fact that you are using a BufferedOutputStream . All this buffering can easily use 15 GB of memory.

-one
source

Source: https://habr.com/ru/post/1492548/


All Articles