VB6 Ms Access Database Editing a large number of records

I need to process hundreds of thousands of records using VB6 and an MS Access database. I iterate over the recordset and edit each record. However, this takes a lot of time. Creating a database with the same number of records using the Addnew and Update methods is much faster.

I would really appreciate it if someone would show me some sample code or just a strategy.

Here is the code

Data1(1).RecordSource = "Select * from TABLE order by Field_A ASC" Data1(1).Refresh If Data1(1).Recordset.RecordCount > 0 Then Data1(1).Recordset.MoveFirst Do Data1(1).Recordset.Edit Data1(1).Recordset.Fields("FIELD") = Sort_Value Data1(1).Recordset.Update Data1(1).Recordset.MoveNext Loop Until Data1(1).Recordset.EOF = True End If 

It is really quite simple. Actually, I forgot to mention that the computer’s hard drive is constantly blushing / writing. This is actually a problem. With such a heavy load, it is impossible to affect performance.

At first, I thought that the recordset generated by the query, remember that we are 1-2 million records, causes this problem. I assume that it is located at some temporary place on the hard drive and is included in RAM. And therefore, executing .Edit and .Update can be a problem when first positioning the cursor in the right place and then writing.

I do not know for sure. There would probably be an expert to show me the way out.

Btw. I also tried changing Loop to Data1 (1). Recordset.EOF = True with a fixed length loop, because I also read that this check for Recordset.EOF also slows down the performance.

Thank you in advance!

+6
source share
4 answers

I created a table called test with fields n and f(n)

Temporary 3 different update routines - recordset without transaction - recordset with transaction - update request

 Sub updateFunction_noTrans() Dim rs As Recordset Set rs = CurrentDb.OpenRecordset("test") rs.MoveFirst Do Until rs.EOF rs.Edit rs("f(n)") = rs("n") + 1 rs.Update rs.MoveNext Loop End Sub 

This is basically what you do, a direct set of records when editing a field

 Sub updateFunction_yesTrans() Dim i As Long Dim commitSize As Long Dim rs As Recordset commitSize = 5000 Set rs = CurrentDb.OpenRecordset("test") DBEngine.Workspaces(0).BeginTrans rs.MoveFirst Do Until rs.EOF rs.Edit rs("f(n)") = rs("n") + 1 rs.Update rs.MoveNext i = i + 1 If i = commitSize Then DBEngine.Workspaces(0).CommitTrans DBEngine.Workspaces(0).BeginTrans i = 0 End If Loop DBEngine.Workspaces(0).CommitTrans End Sub 

This is the same idea, but with transactions. I commit 5,000 records at a time, since a certain limit was set between 9k-10k per commit. You can change this, I believe, by going to the registry.

 Sub updateFunction_updateQuery() CurrentDb.Execute ("UPDATE test SET test.[f(n)] = [n]+1;") End Sub 

This is faster than any of the record set methods. For instance. about 2 million records took ~ 20 seconds without transactions, ~ 18-19 seconds with transactions, ~ 14 seconds with an update request.

All this is under the assumption that the updated field depends on the values ​​calculated from other intruders inside these records.

To really speed up these actions, sometimes it depends on the situation, and more details are required if it is not applied.

Edit: Used the old kernel 2 duos + field indices

+2
source

My only suggestion that may not work in your case is a bulk update using an update request.

Three cases where this might work:

If Sort_Value can be calculated from other fields, this is a simple UPDATE query, but I'm sure you have already seen this.

If Sort_Value can be computed from other records (for example, the previous record), you can write a more complex UPDATE query (I saw several rather complex queries posted here).

Finally, if the same Sort_Value is applied to a large number of records, then you can issue an UPDATE query based on these records. So, if Sort_Value can be 10 different values, then all your updates will be performed in 10 UPDATE queries.


If you tell us where you will get Sort_Value , we can help you in the future.


Here are some things that DO NOT work to speed up the editing / updating commands, according to my testing. All this was done using a table of 10,000 records with 1,000,000 updates.

  • RS (1) instead of RS ("name"). It was offered on another site and actually increased the time by 20%. (25 s / sec)
  • BeginTrans / CommitTrans did not affect the non-indexed field and was 1% faster in the indexed field. (without indexing: 11 sec [w / trans] / 11 sec, indexed: 23 sec [w / trans] / 25 sec) *
  • Separate SQL statements. (86 s)
  • Querydef parameter. (43 s)

* Corrected result.


Code for the BeginTrans / CommitTrans test.

 Sub CommitTest() Dim C As String Dim I As Long Dim J As Long Dim RS As Recordset Dim BegTime As Date Dim EndTime As Date BegTime = Now() Set RS = CurrentDb.OpenRecordset("tblTest") For J = 1 To 200 RS.MoveFirst DBEngine.Workspaces(0).BeginTrans For I = 1 To 5000 C = Chr(Int(Rnd() * 26 + 66)) RS.Edit RS("coltest") = C RS.Update RS.MoveNext Next I DBEngine.Workspaces(0).CommitTrans Next J EndTime = Now() Debug.Print DateDiff("s", BegTime, EndTime) End Sub 
+1
source

Although this may be necessary in some cases, iteration should be avoided using record sets to update the field.

The smartest thing, which is much more efficient, would be to write a SQL update request.

If your table is so large, you should be careful with the choices of your indexes, especially the primary key.

Then you can split PK-based data and update all records in the first set, then in the second, third ...

  UPDATE super_big_table SET Field_A = some_function_to_make_it_sort_value WHERE myPrimaryKey BETWEEN ( left_boundary AND right_boundary ) 

You repeat this (by code) for all sections that you have made in your table.

Now the question is: can you think of an access function that creates the required Sort_value value?

Please note that if Field_A is your primary key, you should not change it. Otherwise, the whole table will be regrouped every time you update several records, which will be a lot of work for your HD / processor. In this case, you should have another PK and create an index on Field_A, and not on PK.

+1
source

To improve performance, you can use the UpdateBatch method of the ADODB. But to use this feature requires:

  • adOpenStatic cursorType and
  • adLockBatchOptimistic LockType

at the recordset object.
In addition, you can also use adUseClient CursorLocation to load to the client instead of the server during the operation.

To continue, refrain from using the rec.EOF test. Instead, you should use for a loop starting from 1 to rec.RecordCount

Worth mentioning: rec.RecordCount is not rec.RecordCount to read rec.RecordCount until ADODB traverses all the records. So do MoveLast and MoveFirst to ensure correct recording.

Use the following code as a hint:

 set con = Server.CreateObject("ADODB.Connection") con.Provider = "Microsoft.Jet.OLEDB.4.0" con.Open(Server.Mappath("MyShopDB.mdb")) set rec = Server.CreateObject("ADODB.recordset") sql = "SELECT * FROM Employees" rec.CursorLocation = adUseClient rec.CursorType = adOpenStatic rec.LockType = adLockBatchOptimistic rec.Open sql, con if not rec.EOF then ' rescue no records situation rec.moveLast ' let it see all records rec.moveFirst end if cnt = rec.RecordCount ' avoid reading each time in loop test if cnt > 0 then for i = 1 to cnt rec.Fields("FIELD").value = Sort_Value '... '... '... rec.MoveNext next i rec.UpdateBatch end if rec.Close con.Close 

More than 3 years have passed when I switched from VB to PHP . Some tracks may be missing here. Please note: I have not executed this code yet. It may contain minor problems, but this should be sufficient to indicate the purpose.

You can also try splitting the package to see the impact on performance.

0
source

Source: https://habr.com/ru/post/947431/


All Articles