Why is the throughput of this C # data processing application much lower than the raw capabilities of the server?

Question

Why is the throughput of this C # data processing application much lower than the raw capabilities of the server?

I put together a little test lead to diagnose why the bandwidth of my C # data processing application (its main function selects records in batches of 100 from a remote database server using non-blocking IO and performs simple processing on them) is much lower than it could be . I noticed that while running the application does not encounter bottlenecks on the way to the CPU (<3%), network or disk I / O or RAM, and does not underline the database server (the data set in the database is almost always completely in RAM). If I run several instances of the application in parallel, I can get up to ~ 45 instances with only a 10% degradation of latency, but with a 45-fold increase in throughput, before loading the processor on the database server becomes a bottleneck (at that time,there are still no bottlenecks on the client).

My question is, why does TPL not increase the number of tasks in flight or otherwise increase throughput when the client server can significantly increase throughput?

Simplified code excerpt:

    public static async Task ProcessRecordsAsync()
    {
        int max = 10000;
        var s = new Stopwatch();
        s.Start();
        Parallel.For(0, max, async x => 
        {
            await ProcessFunc();
        });
        s.Stop();
        Console.WriteLine("{2} Selects completed in {0} ms ({1} per ms).", s.ElapsedMilliseconds, ((float)s.ElapsedMilliseconds) / max, max);
    }

    public static async Task ProcessFunc()
    {
        string sql = "select top 100 MyTestColumn from MyTestTable order by MyTestColumn desc;";
        string connStr = "<blah>...";

        using (SqlConnection conn = new SqlConnection(connStr))
        {
            try
            {
                conn.Open();
                SqlCommand cmd = new SqlCommand(sql, conn);
                DbDataReader rdr = await cmd.ExecuteReaderAsync();

                while (rdr.Read())
                {
                    // do simple processing here
                }
                rdr.Close();
            }
            catch (Exception ex)
            {
                Console.WriteLine(ex.ToString());
            }
        }
    }

+4

c # task-parallel-library

Dan Dec 30 '16 at 16:54

source share

2 answers

SQL- , 100. , . SQL-, , .

+1

Matt Cole 30 . '16 18:23

Clay · Accepted Answer · 2016-12-30T20:13:57+0000

Parallel For does not try to strangle the life of your processor and maximize the number of parallel threads working for you. It uses the number of cores as a starting point and may increase depending on the nature of the workload. See this question .

Once this happens, you actually block IO ... when you open the connection and read strings. Instead, you can try:

//....
using (var conn = new SqlConnection(connStr))
{
  await conn.OpenAsync();
  SqlCommand cmd = new SqlCommand(sql, conn);
  try
  {
    using ( var rdr = await cmd.ExecuteReaderAsync())
    { 
      while (await rdr.ReadAsync())
      {
        // do simple processing here
      }
    }
  }
  catch (Exception ex)
  {
    Console.WriteLine(ex.ToString());
  }
}
//...

Why is the throughput of this C # data processing application much lower than the raw capabilities of the server?

More articles: