Performance of processing large arrays in a .NET application

I am not sure that this is an elementary problem that I have, but I could not figure it out for several days ...

I am currently writing a .NET library in C # (using .NET 3.5). Among other things, it includes functions for writing the triplet of Int32 arrays (several MB each) to the remote cache (memcached server) and for reading these arrays. The problem was that every implementation I could come up with took ~ 1.2 seconds to write 10 MB of data and another 1.2 seconds to read the same data, even when the memcached server was running on the local machine.

But then, in order to compare performance, I replaced the write to the cache server with only the clipboard entry and noticed that it still takes 1.2 seconds to write. I have a test method call (shown below) surrounded by start / stop stopwatch for benchmarking. Method:

 public void writeCachedImageData(int[,] atoms, int[,] noAtoms, int[,] dark, int cameraID, int runID, int seqID) { Clipboard.SetData(DataFormats.Serializable, atoms); Clipboard.SetData(DataFormats.Serializable, noAtoms); Clipboard.SetData(DataFormats.Serializable, dark); } 

For benchmarking, each array was 1000x1000. Basically my question is:

1) . Did I conclude correctly in this test that my bottleneck is somehow the speed of transferring data from the application to something external? and

2) If yes, then what can I do to improve the speed of data transfer from the application to the clipboard or, ultimately, to the memcached server?

+6
source share
3 answers

Try using 1D arrays instead. They are much faster. So, to highlight the matrix N×M do var array = new Int16[N*M]; and access the element (i,j) do array[M*i+j]=...

In my testing, there is a significant improvement over 2D arrays.

If you need, an equally fast way (albeit a bit slower) is to use jagged arrays. You highlight var array = new Int16[N][]; and then for each row array[i] = new Int16[M]; . You access the contents with array[i][j]=...

See a similar answer: fooobar.com/questions/985195 / ...

If your matrices are symmetric, you can speed things up with this fooobar.com/questions/364605 / ...

code

 static class Program { [STAThread()] static void Main(string[] args) { const int N=10000; double t1, t2; { var array=new Int16[N, N]; t1=ClockIt(() => { for (int i=0; i<N; i++) { for (int j=0; j<N; j++) { array[i, j]=32767; } } var bytes=new byte[sizeof(Int16)*array.Length]; Buffer.BlockCopy(array, 0, bytes, 0, bytes.Length); Clipboard.SetData(DataFormats.Serializable, bytes); }); } { var array=new Int16[N* N]; t2=ClockIt(() => { for (int i=0; i<N; i++) { for (int j=0; j<N; j++) { array[N*i+j]=32767; } } var bytes=new byte[sizeof(Int16)*array.Length]; Buffer.BlockCopy(array, 0, bytes, 0, bytes.Length); Clipboard.SetData(DataFormats.Serializable, bytes); }); } Console.WriteLine(string.Format("t1={0}, t2={1}",t1, t2)); } public static double ClockIt(this Action test) { var sw=Stopwatch.StartNew(); test(); sw.Stop(); return sw.Elapsed.TotalSeconds; } } 

Results (in seconds)

 t1=1.110093, t2=0.6908793 (61% faster) 

I built the console application in Release mode and ran it from the command window. The results are very consistent. With large arrays, acceleration is much greater.

+1
source

Try removing serialization from your test code. It is very likely that what causes you this problem. But first, make sure it's 100% serialization by profiling your application. If this is really a bottleneck, you can try some tricks

  • Faster serializer like protobuf
  • Compress input before saving e.g. with snappy / lz4 / gzip
  • Serialize in parallel with some user actions, as soon as you understand that the user wants to save things, you can prepare what to save while he / she looks at the destination folder.
0
source

Following the advice above, I tried to avoid serialization and did this with BlockCopy:

  public void writeCachedImageData(Int16[,] atoms, int cameraID, int runID, int seqID) { byte[] bytesAtoms = new byte[2 * 1000 * 1000]; Buffer.BlockCopy(atoms, 0, bytesAtoms, 0, bytesAtoms.Length); Clipboard.SetData(DataFormats.Serializable, bytesAtoms); Clipboard.SetData(DataFormats.Serializable, bytesAtoms); Clipboard.SetData(DataFormats.Serializable, bytesAtoms); } 

Note that I changed the data type to Int16 - this is because I realized that my application does not need two other bytes of type int. The combination of these two has led to more than tenfold acceleration!

0
source

Source: https://habr.com/ru/post/985194/


All Articles