We want to transfer a huge amount of data between Oracle 11g R2 and SQL Server 2014 in a small amount of time ... we are talking about 20+ Tb, thousands of tables and billions of records (this is 5 years of Datawarehouse)
SSIS is not an option, because the number of tables and partitions that we need to transfer is huge, about 40 thousand tables and partitions. We have several marketing applications, campaign managers, mining models, etc. working in different schemes ... some of them create about 150 new tables daily, and the "drag and drop speed" is about 100 daily ... some these tables have indexes, some of them ... these schemes are the reason why we canโt just use SSIS because we need to translate them too ... Change to include more information on why SSIS does not look like a viable option .
So, we developed an application in C # 2013 with .Net 4.5.1, and it is multi-threaded ... each thread reads a table / partition in oracle, creates the same table / partition scheme in sql server and proceeds to select data in oracle and the amount inserted into the sql server, and finally the creation of each pending constraint and index ...
One of the main problems we are facing is data transfer speed ... Comparing SSIS performance with transferring 1 fact table with 30 sections (monthly fact, daily section, about 30 million rows per section, 60 + columns) against our C # application, we found that the application never uses full network speed, while SSIS uses 100% (we use SSIS Attunity connectors for Oracle ... and probably one of the advantages of speed transfer here, I donโt know), and we want to improve it if we can ...
This is the code block responsible for writing
//private static async Task saveDataBlock(IDataReader reader, string destinationTable, int batchSize) private static void saveDataBlock(IDataReader reader, string destinationTable, int batchSize) { //System.Data.SqlClient.SqlBulkCopy bc = new System.Data.SqlClient.SqlBulkCopy(getConnString(destinationCS)); //System.Data.SqlClient.SqlBulkCopy bc = new System.Data.SqlClient.SqlBulkCopy(getConnString(destinationCS), System.Data.SqlClient.SqlBulkCopyOptions.KeepIdentity & System.Data.SqlClient.SqlBulkCopyOptions.KeepNulls & System.Data.SqlClient.SqlBulkCopyOptions.TableLock); using (SqlBulkCopy bc = new SqlBulkCopy(getConnString(destinationCS))) { bc.BulkCopyTimeout = 0; bc.DestinationTableName = destinationTable; bc.BatchSize = batchSize; //2500,5000,10000.. best so far, 5000 //bc.BatchSize = 0; bc.NotifyAfter = batchSize; bc.SqlRowsCopied += new SqlRowsCopiedEventHandler(s_SqlRowsCopied); //bc.EnableStreaming = true; bc.ColumnMappings.Clear(); for (int i = 0; i < reader.FieldCount; i++) { bc.ColumnMappings.Add(reader.GetName(i), reader.GetName(i)); } bc.WriteToServer(reader); //await bc.WriteToServerAsync(reader); //bc.Close(); } //System.Data.SqlClient.SqlBulkCopy bc = new System.Data.SqlClient.SqlBulkCopy(getConnString(destinationCS), SqlBulkCopyOptions.UseInternalTransaction); }
Does anyone have suggestions for any option that we can check configure (code comments from previous tested options), in SQL Server or in an SQLBulkCopy object in a C # application?
PS: some information about our environment ... Oracle Server Client Uplink 2 Gbps, SQL Server Downlink 1 Gbps, Oracle is a 32-core system, our test SQL Server is a 16-core Windows Server 2012 R2 ... Data transfer rate SSIS Net is 1 Gbps, C # App Net's transmission speed is around 70 Mbps, with 16 streams ...
Edit for more information about baud rate
More about our application tests:
2 streams = 15-25 Mbps
4 streams = 30-40 Mbit / s
8 streams = 60-65 Mbps
16 streams = 65-70 Mbps
All above 16 (1 application thread per system core) reduces performance to 30-50 Mbit / s.
Our SAN is very capable of speeds above 500 MB / s with high I / O rates.
Our best batch size, we got it with values โโbetween 2500-5000 rows per batch (about 15 million rows in 5 minutes with 8-16 threads)
Right now, our application is transferring data from one table / partition to oracle to another in SQL Server ... we have several offline tables with 100 million rows ... for these tables we tested several streams looking at the same table .. .we succeeded in reading, but could not perform bulk data insertion
Each thread reads the same table performing the MOD operation on a numeric column ...
select * from schema.table where MOD(NUMERIC_COLUMN, N) = 0 to N-1
N is the number of threads that we run ... We tried to recreate some of the SSIS behaviors to maximize thread utilization and read / write data in Oracle / SQL. In SSIS, we can set each thread using the LOCK TABLE parameter and it worked flawlessly .. but, doing this in our application, each thread locks the table during insertion, and finally, our parallel design starts serialized :( (This is not the main reason for the question, but if anyone has any suggestions about this, it will be appreciated)