Multithreaded Data Transfer Between Oracle and SQL Server - Network Performance

Question

Multithreaded Data Transfer Between Oracle and SQL Server - Network Performance

We want to transfer a huge amount of data between Oracle 11g R2 and SQL Server 2014 in a small amount of time ... we are talking about 20+ Tb, thousands of tables and billions of records (this is 5 years of Datawarehouse)

SSIS is not an option, because the number of tables and partitions that we need to transfer is huge, about 40 thousand tables and partitions. We have several marketing applications, campaign managers, mining models, etc. working in different schemes ... some of them create about 150 new tables daily, and the "drag and drop speed" is about 100 daily ... some these tables have indexes, some of them ... these schemes are the reason why we can’t just use SSIS because we need to translate them too ... Change to include more information on why SSIS does not look like a viable option .

So, we developed an application in C # 2013 with .Net 4.5.1, and it is multi-threaded ... each thread reads a table / partition in oracle, creates the same table / partition scheme in sql server and proceeds to select data in oracle and the amount inserted into the sql server, and finally the creation of each pending constraint and index ...

One of the main problems we are facing is data transfer speed ... Comparing SSIS performance with transferring 1 fact table with 30 sections (monthly fact, daily section, about 30 million rows per section, 60 + columns) against our C # application, we found that the application never uses full network speed, while SSIS uses 100% (we use SSIS Attunity connectors for Oracle ... and probably one of the advantages of speed transfer here, I don’t know), and we want to improve it if we can ...

This is the code block responsible for writing

//private static async Task saveDataBlock(IDataReader reader, string destinationTable, int batchSize) private static void saveDataBlock(IDataReader reader, string destinationTable, int batchSize) { //System.Data.SqlClient.SqlBulkCopy bc = new System.Data.SqlClient.SqlBulkCopy(getConnString(destinationCS)); //System.Data.SqlClient.SqlBulkCopy bc = new System.Data.SqlClient.SqlBulkCopy(getConnString(destinationCS), System.Data.SqlClient.SqlBulkCopyOptions.KeepIdentity & System.Data.SqlClient.SqlBulkCopyOptions.KeepNulls & System.Data.SqlClient.SqlBulkCopyOptions.TableLock); using (SqlBulkCopy bc = new SqlBulkCopy(getConnString(destinationCS))) { bc.BulkCopyTimeout = 0; bc.DestinationTableName = destinationTable; bc.BatchSize = batchSize; //2500,5000,10000.. best so far, 5000 //bc.BatchSize = 0; bc.NotifyAfter = batchSize; bc.SqlRowsCopied += new SqlRowsCopiedEventHandler(s_SqlRowsCopied); //bc.EnableStreaming = true; bc.ColumnMappings.Clear(); for (int i = 0; i < reader.FieldCount; i++) { bc.ColumnMappings.Add(reader.GetName(i), reader.GetName(i)); } bc.WriteToServer(reader); //await bc.WriteToServerAsync(reader); //bc.Close(); } //System.Data.SqlClient.SqlBulkCopy bc = new System.Data.SqlClient.SqlBulkCopy(getConnString(destinationCS), SqlBulkCopyOptions.UseInternalTransaction); }

Does anyone have suggestions for any option that we can check configure (code comments from previous tested options), in SQL Server or in an SQLBulkCopy object in a C # application?

PS: some information about our environment ... Oracle Server Client Uplink 2 Gbps, SQL Server Downlink 1 Gbps, Oracle is a 32-core system, our test SQL Server is a 16-core Windows Server 2012 R2 ... Data transfer rate SSIS Net is 1 Gbps, C # App Net's transmission speed is around 70 Mbps, with 16 streams ...

Edit for more information about baud rate

More about our application tests:

2 streams = 15-25 Mbps

4 streams = 30-40 Mbit / s

8 streams = 60-65 Mbps

16 streams = 65-70 Mbps

All above 16 (1 application thread per system core) reduces performance to 30-50 Mbit / s.

Our SAN is very capable of speeds above 500 MB / s with high I / O rates.

Our best batch size, we got it with values between 2500-5000 rows per batch (about 15 million rows in 5 minutes with 8-16 threads)

Right now, our application is transferring data from one table / partition to oracle to another in SQL Server ... we have several offline tables with 100 million rows ... for these tables we tested several streams looking at the same table .. .we succeeded in reading, but could not perform bulk data insertion

Each thread reads the same table performing the MOD operation on a numeric column ...

 select * from schema.table where MOD(NUMERIC_COLUMN, N) = 0 to N-1

N is the number of threads that we run ... We tried to recreate some of the SSIS behaviors to maximize thread utilization and read / write data in Oracle / SQL. In SSIS, we can set each thread using the LOCK TABLE parameter and it worked flawlessly .. but, doing this in our application, each thread locks the table during insertion, and finally, our parallel design starts serialized :( (This is not the main reason for the question, but if anyone has any suggestions about this, it will be appreciated)

+5

multithreading c # oracle sql-server ssis

figurettipy Dec 16 '14 at 21:15

source share

1 answer

Michael humelsine · Answer 1 · 2015-01-09T21:55:22+0000

You are close to the theoretical maximum throughput of 1 Gb / s. The maximum throughput of 1 Gbit / s is 125 MB / s without any overhead. You really look at 90 MB / s, and if you go through a layer 3 switch, router, or firewall that has to process packets, throughput will decrease. It is best to get servers in one network segment and get a higher speed of communication between them. I suspect that your throughput drops when the number of threads is larger, because the threads are struggling to access the NIC (or the network adapter introduces a delay on the packet due to the large packet size), and the threads pause (sleep) and try again.

I think your problem is bandwidth. Try a higher speed connection by connecting two 1 Gbps links. Or, if the servers are close to each other, connect them directly (without a switch) using a crossover cable to see if transmission speeds are increasing. If they do, you know that you have a bandwidth problem. If they do not, you have at least eliminated the opportunity.

Multithreaded Data Transfer Between Oracle and SQL Server - Network Performance

More articles: