Batch Processing: Visual Studio

I am working on C # where I read a huge table from 1 database and load it into my DataTable .

Since the table contains a huge set of rows (1 800 000+), and I continue to receive an error from memory, I tried to break it and copy it 100,000 rows at a time, clear the memory and repeat until all the data in the table is from the source loaded into my DataTable .

Can you just take a look at my code and tell me if I am right? From what it seemed to me, I read the first 100,000 lines again and again, and my program runs endlessly.

Is there a counter that needs to be added to my DataTable ? So he adds the following rowset ??? The following is a snippet of code:

  public IoSqlReply GetResultSet(String directoryName, String userId, String password, String sql) { IoSqlReply ioSqlReply = new IoSqlReply(); DataTable dtResultSet = new DataTable(); IoMsSQL ioMsSQL = null; int chunkSize = 100000; try { using (OdbcConnection conn = new OdbcConnection(cs)) { conn.Open(); using (OdbcCommand cmd = new OdbcCommand(sql, conn)) { using (OdbcDataReader reader = cmd.ExecuteReader()) { for (int col = 0; col < reader.FieldCount; col++) { String colName = reader.GetName(col); String colDataType = reader.GetFieldType(col).ToString(); ; dtResultSet.Columns.Add(reader.GetName(col), reader.GetFieldType(col)); } // now copy each row/column to the datatable while (reader.Read()) // loop round all rows in the source table { DataRow row = dtResultSet.NewRow(); for (int ixCol = 0; ixCol < reader.FieldCount; ixCol++) // loop round all columns in each row { row[ixCol] = reader.GetValue(ixCol); } // ------------------------------------------------------------- // finished processing the row, add it to the datatable // ------------------------------------------------------------- dtResultSet.Rows.Add(row); GC.Collect(); // free up memory }//closing while ioSqlReply.DtResultSet = dtResultSet; // return the data table ioSqlReply.RowCount = dtResultSet.Rows.Count; Console.WriteLine("DTRESULTSET:ROW COUNT FINAL : " + dtResultSet.Rows.Count); ioSqlReply.Rc = 0; } } } } 
+4
source share
3 answers

You must limit the number of rows in your Sql, for example ...

 SELECT TOP 10000 * FROM SomeTable; 

if you do not, and you have 1.8M in your request, then there is no system capable of handling it.

But this will force your application to process only the first 10,000 rows ... if you need to process all the rows, then you must repeat the execution of this sql block, there are no more rows ... for example

 public IoSqlReply GetResultSet(String directoryName, String userId, String password, String sql) { IoSqlReply ioSqlReply = new IoSqlReply(); DataTable dtResultSet = new DataTable(); IoMsSQL ioMsSQL = null; bool keepProcessing = true; try { using (OdbcConnection conn = new OdbcConnection(cs)) { conn.Open(); while (keepProcessing) { using (OdbcCommand cmd = new OdbcCommand(sql, conn)) { using (OdbcDataReader reader = cmd.ExecuteReader()) { if (reader.HasRows) { for (int col = 0; col < reader.FieldCount; col++) { String colName = reader.GetName(col); String colDataType = reader.GetFieldType(col).ToString(); ; dtResultSet.Columns.Add(reader.GetName(col), reader.GetFieldType(col)); } // now copy each row/column to the datatable while (reader.Read()) // loop round all rows in the source table { DataRow row = dtResultSet.NewRow(); for (int ixCol = 0; ixCol < reader.FieldCount; ixCol++) // loop round all columns in each row { row[ixCol] = reader.GetValue(ixCol); } // ------------------------------------------------------------- // finished processing the row, add it to the datatable // ------------------------------------------------------------- dtResultSet.Rows.Add(row); GC.Collect(); // free up memory }//closing while ioSqlReply.DtResultSet = dtResultSet; // return the data table ioSqlReply.RowCount = dtResultSet.Rows.Count; Console.WriteLine("DTRESULTSET:ROW COUNT FINAL : " + dtResultSet.Rows.Count); ioSqlReply.Rc = 0; } else { keepProcessing = false; } } } } } } 

This is a very crude example ... It can be improved, but I think it is an easy solution to your problem.

+1
source

1) Do you work on a 64-bit machine?

2) 1 800 000 lines. Suppose that 1 KB is an average per row. 1.8 GB of memory

3) Is there a reason why you need to load everything into memory? Can you transfer data and work with it one line at a time?

4) Why not just let the DB process large tables instead of your client program?

If you work with large data tables, you probably have to take a different approach than just loading everything into memory. You will need a new design.

edit: It would be useful to know more about what you are trying to do and how much data you are working on.

+1
source

Try the following:

  • Use some kind of buffering. For example, use a return-return enumerator and insert several thousand items at a time. Example for reading:

    public IEnumerable> ReadFileByLines (string cs, int Buffer = BUFFER) {

      OdbcConnection conn = new OdbcConnection(cs); OdbcCommand cmd = new OdbcCommand(sql, conn); OdbcDataReader reader = cmd.ExecuteReader(); List<string[]> bufferList = new List<string[]>(Buffer); while (reader.Read()) { bufferList.Add(reader["something"]); // or add a custom class here if (bufferList.Count == Buffer) { yield return bufferList.ToArray(); bufferList = new List<string[]>(Buffer); } } yield return bufferList.ToArray(); } 

    ... or create a stored procedure that will serve the server as a paging mechanism, but this will create many database calls. In any case, the reading problem arises if you read 1.8 million lines, which you must read them anyway.

  • Use BulkInsert to insert large amounts of data simultaneously into databases.
  • Disable GC.Collect ()
0
source

Source: https://habr.com/ru/post/1384749/


All Articles