Byte [] and effectively passed by reference

Thus, this is associated with processing a large heap of objects and tries to minimize the number of byte instances []. Basically, I have OutOfMemoryExceptions, and I feel that this is due to the fact that we are creating too many byte arrays. The program works great when processing multiple files, but it needs to be scaled, and currently it cannot.

In a nutshell, I have a loop that retrieves documents from a database. Currently, it pulls one document at a time, and then processes the document. Documents can range from less than mega to 400+ megabytes. (hence why I process one at a time). Below is the pseudocode before optimization.

So, the steps that I am doing are:

  • Make a call to the database to find the largest file size (and then multiply it by 1.1)

    var maxDataSize = new BiztalkBinariesData().GetMaxFileSize(); maxDataSize = (maxDataSize != null && maxDataSize > 0) ? (long)(maxDataSize * 1.1) : 0; var FileToProcess = new byte[maxDataSize]; 
  • Then I make another database call, pulling all the documents (without data) from the database and putting them in IEnumerable.

     UnprocessedDocuments = claimDocumentData.Select(StatusCodes.CurrentStatus.WaitingToBeProcessed); foreach (var currentDocument in UnprocessDocuments) { // all of the following code goes here } 
  • Then I populate the byte [] array from an external source:

     FileToProcess = new BiztalkBinariesData() .Get(currentDocument.SubmissionSetId, currentDocument.FullFileName); 
  • That is the question. It would be much easier to pass the current document (IClaimDocument) to other processing methods. So, if I set some of the data in the current document to a preformatted array, will this use the existing link? Or does it create a new array in a large bunch of objects?

     currentDocument.Data = FileToProcess; 
  • At the end of the loop, I cleaned up FileToProcess

     Array.Clear(FileToProcess, 0, FileToProcess.length); 

Was it clear? If not, I will try to clear it.

+4
source share
5 answers

Step 1:

 var FileToProcess = new byte[maxDataSize]; 

Step 3:

 FileToProcess = new BiztalkBinariesData() .Get(currentDocument.SubmissionSetId, currentDocument.FullFileName); 

Your step 1 is absolutely unnecessary, because you re-assign the array in step 3 - you create a new array, you do not fill the existing array. Thus, essentially step 1 just creates more work for the GC, that if you do it in a quick manner (and if it is not optimized by the compiler, which is quite possible), some of the memory pressure that you see may be explained.

+6
source

Arrays are reference types, and so you will pass a copy of the link, not a copy of the array itself. This will be true only for value types.

This simple snippet illustrates how arrays behave like reference types:

 public void Test() { var intArray = new[] {1, 2, 3, 4}; EditArray(intArray); Console.WriteLine(intArray[0].ToString()); //output will be 0 } public void EditArray(int[] intArray) { intArray[0] = 0; } 
+3
source

He will use the existing link, do not worry. The contents of the array are not copied.

+2
source

Your problem may be the implementation and use of the BiztalkBinariesData class.

I'm not sure how this is implemented, but I see that you declare a new instance every time

 new BiztalkBinariesData() 

Something to think about ..

+2
source

I have OutOfMemoryExceptions and I feel like this because we create too many bytes of the array

No, this is because you are allocating LARGE arrays. Limit them to 48kb or 64kb and "combine" them with a custom container. 64kb means you can take the higher 2 bytes of the index to determine which array to use. The container contains arrays of arrays. Processing very large objects leads to fragmentation and the inability to later allocate one large array.

+1
source

Source: https://habr.com/ru/post/1394008/


All Articles