C #, C ++ and java performance comparison (weird C # behavior)

I am implementing a Floyd-Warsall algorithm with C ++, C # and java. in each language I use a serial and parallel implementation after testing, the result:
(Elapsed time - only for basic functions and reading files, Inti of Variable and ... not measured.)
download sources here SourceCodes

C ++

  • IDE: Netbeans
  • Compiler: MinGW-4.8.1
  • sequential time: 9.333000
  • Parallel Time: 2.539000
  • using OpenMp for parallel

if NumOfThreads = 1, then Sequential else Parallel

variables

 #define n 1000 /* Then number of nodes */ double dist[n][n]; void floyd_warshall(int NumOfThreads) { int i, j, k; omp_set_num_threads(NumOfThreads); for (k = 0; k < n; ++k) #pragma omp parallel for private(i,j) for (i = 0; i < n; ++i) for (j = 0; j < n; ++j) if ((dist[i][k] * dist[k][j] != 0) && (i != j)) if ((dist[i][k] + dist[k][j] < dist[i][j]) || (dist[i][j] == 0)) dist[i][j] = dist[i][k] + dist[k][j]; } 

Java

  • IDE: Netbeans
  • Compiler: Netbeans by default
  • sequential time: 11.632
  • Parallel Time: 3.089
  • -Xms512m -Xmx1024m
  • import java.util.concurrent.Callable;
    import java.util.concurrent.ExecutionException;
    import java.util.concurrent.ExecutorService;
    import java.util.concurrent.Future;

java variables

  public final int numNodes =1000; public final double[][] costs= new double[numNodes][numNodes] ; 

I did not put java code here because its work is on the right (I think)

WITH#

  • IDE: Visual Studio 2012
  • Compiler: Visual Studio 2012 by default
  • sequential time: 31.312
  • Parallel Time: 8.920
  • using System.Threading.Tasks;

variable

  const int n = 1000; static double[,] dist = new double[n, n]; 

parallel code

  static void floyd_warshall(ParallelOptions pOp) { int k; for (k = 0; k < n; ++k) Parallel.For<int>(0, n, pOp, () => 0, (i, loop, j) => { for (j = 0; j < n; ++j) if ((dist[i, k] * dist[k, j] != 0) && (i != j)) if ((dist[i, k] + dist[k, j] < dist[i, j]) || (dist[i, j] == 0)) dist[i, j] = dist[i, k] + dist[k, j]; return 0; }, (j) => { }); 

one code

  static void floyd_warshallSingle() { int i, j, k; for (k = 0; k < n; ++k) for (i = 0; i < n; ++i) for (j = 0; j < n; ++j) if ((dist[i,k] * dist[k,j] != 0) && (i != j)) if ((dist[i,k] + dist[k,j] < dist[i,j]) || (dist[i,j] == 0)) dist[i,j] = dist[i,k] + dist[k,j]; } 

What happened to my C # implementation?
everyone uses the same file
Now my question is, why does solving this algorithm take longer with C #? java and C ++ have almost the same time, but I think my implementation with C # is wrong, because this difference between C # and C ++ is weird!
please help me improve my C # implementation or name some reasons Thank you!

change 1


i changed my arrays to a notched array, and the result changed to:

  • consecutive time: 19.22
  • Parallel Time: 4.903

still its a huge difference between C # and C ++ or java! any idea why?
new variables

 const int n = 1000; static double[][] dist = new double[n][]; 

new codes:

 static void floyd_warshallSingle() { int i, j, k; for (k = 0; k < n; ++k) for (i = 0; i < n; ++i) for (j = 0; j < n; ++j) if ((dist[i][k] * dist[k][j] != 0) && (i != j)) if ((dist[i][k] + dist[k][j] < dist[i][j]) || (dist[i][j] == 0)) dist[i][j] = dist[i][k] + dist[k][j]; } static void floyd_warshall(ParallelOptions pOp) { int k; for (k = 0; k < n; ++k) Parallel.For<int>(0, n, pOp, () => 0, (i, loop, j) => { for (j = 0; j < n; ++j) if ((dist[i][k] * dist[k][j] != 0) && (i != j)) if ((dist[i][ k] + dist[k][j] < dist[i][ j]) || (dist[i][j] == 0)) dist[i][ j] = dist[i][k] + dist[k][j]; return 0; }, (j) => { }); } 
+6
source share
1 answer

One way to determine if this checks the array constraint, or at least determines whether checking the array bounds is part of the task, would be to remove some of the index calculations. For instance:

  static void floyd_warshallSingle() { int i, j, k; for (k = 0; k < n; ++k) { var distk = dist[k]; for (i = 0; i < n; ++i) { var disti = dist[i]; for (j = 0; j < n; ++j) if ((i != j) && (disti[k] * distk[j] != 0)) if ((disti[j] == 0) || disti[k] + distk[j] < disti[j]) disti[j] = disti[k] + distk[j]; } } } 

Here, all I did was use distk as a reference to dist[k] . I suspect that the compiler is already doing this optimization, which most likely means that you are aware of the performance gains you get from switching from a rectangular array to a jagged array. But worth checking out.

In addition, you said that you are working without debugging. I assume you are also running release builds? And all three programs (C ++, Java and C #) work the same? That is, are they all 64-bit executables? All 32-bit executables? Be careful with C # because the Prefer 32-bit flag can be enabled in the project options. This can lead to launch in 32-bit mode when you compile “Any processor” even on a 64-bit system.

+2
source

Source: https://habr.com/ru/post/978548/


All Articles