Need the best optimized code?

You need a lot of optimized code. It's good that I have a project, and I have Succefully that it works with vba (The stackoverflow developers mostly helped. Thanks for that) But today I got Feedback. Its deletion of 2 more unique entries in the record. But I do not know why its removing them.

The algorithm that I applied

I used the COUNTIF function that I found on google

="countif(A$1:A2,A3)=0" A3 is the active cell, Checks A2,A1 for dupes 

It Throws False if there is a duplicate in column A, and True. If it is unique. What I understood about Countif is that He checks all the above column values โ€‹โ€‹from this cell, I mean, take A4. SO checks A2, A1, A3 for duplicate. Similarly, A10 checks A1 on A9 and throws TRUE or False.Well. He works. But I donโ€™t know what went wrong. The code does not work for some entries. Sometimes it shows False for unique entries.

And he has more time to apply this formula, since I have more data. I am trying to make it cleaner and more optimizing. People told me that this is not c or some other language so that it will be optimized, but I need code that makes my code more optimized.

I need code for these conditions, can someone help me, since my account could not. A bit helpless at that.

1) I have a column and I have to check for duplicates in this column and delete this row if it is a duplicate

2) I have 35,000 old records in a column, and I have 2000 new records every time they are added. I need to check these 2000 records out of a total of 37000 (since we added 35000 + 2000), and this delete operation should only be performed on newly added 2000 records, but it should check for duplicates for the entire column

Let me explain to you that I have recently added 2,000 records, therefore only these records should be checked for duplicates of 35,000 records, as well as from myself (2,000 records) and delete it if this is a duplicate and does not duplicate the operation should be performed at 35,000 records of old data.

I found several codes, but they even delete duplicates from 35,000 records. I set the range but didn't even work. Can someone help me with better code that takes less time? Please thanks

Updating my question with the code code I have

  ABFGHIY PTY 39868.5 4 2 540 3 PTY39868.5425403 GTY 34446.1234 2 1 230 1 GTY34446.1234212301 PTY 3945.678 2 2 PTY3945.67822 GTY 34446.1234 2 1 230 1 GTY34446.1234212301 let us say these are old 35000 entries 

Explanation of the above example.

Above are 35,000 records. I have to check columns A, B, F, G, H, I'm for cheating, if they are the same, that I have to delete the row, I should not worry about other columns c, d, etc., So I did it I used one unused column Y and combined these 6 columns into 1 column Y using these

  = A2 & B2 & F2 & G2 & H2 &I2 with the respective columns 

Now check the Y column for duplicates and delete the entire row. since 2003 only supports one column as far as I know.

Please note that even 35,000 records may have duplicates, but I should not delete them. For example, you can see that 2 and the last line in my sample code are tricks, but I should not delete because this is old data.

  ABFGHIY PTY 39868.5 4 2 540 3 PTY39868.5425403 'old GTY 34446.1234 2 1 230 1 GTY34446.1234212301 'old PTY 3945.678 2 2 PTY3945.67822 'old GTY 34446.1234 2 1 230 1 GTY34446.1234212301 'old PTY 3945.678 1 1 230 2 PTY3945.678112302 'new PTY 39868.5 4 2 540 3 PTY39868.5425403 'new PTY 3945.678 1 1 230 2 PTY3945.678112302 'new 

Now notice that the new PTY record (from the last second) is a duplicate of the original record (PTY first). Therefore, I hava to remove it. And the last new record is a duplicate of the newest record, so I have to delete this even this. SO in the above code, I need to remove only the last 2 lines, which are cheats of the original record, as well as from it. But you should not delete GTY, which is a hoax, but which is in the original recording.

I think now I have made it clear. Connects them into one cell. Is this the best way to get closer? like conactenatin for 40,000 records, taking only 2 seconds, I think it doesnโ€™t matter, but any algorithms for them are significantly justified

I heard that the oral recommendations 45.00 and 45.00000 differ from each other in that this could be a problem with this? since I have decimal points in my data. I think I should have done

  = I2 & H2 & G2 & F2 & A2 & B2 

which is better to concatenate? Is this or something else that I wrote before?

+4
source share
7 answers

Ok, now we have more information, here is the solution. It should run almost instantly.

The code works by populating the y column with your concatenate formula. He then adds the entire column y to the dictionary and, using the dictionary, marks each row as a duplicate in column z. Then it deletes all duplicates found after row 35000. Then, finally, it clears both the y and z columns to remove redundant data.

 Sub RemoveDuplicates() Dim vData As Variant, vArray As Variant Dim lRow As Long '// Get used range of column A (excluding header) and offset to get column y With ActiveSheet.Range("A2", Cells(Rows.Count, "A").End(xlUp)).Offset(, 24) '// Adds the concatenate formula to the sheet column (y) .FormulaR1C1 = "=RC[-24]&RC[-23]&RC[-19]&RC[-18]&RC[-17]&RC[-16]" '// Adds the formula results to an array vData = .Resize(, 1).value End With '// Re dimension the array to the correct size ReDim vArray(1 To UBound(vData, 1), 0) '// Create a dictionary object using late binding With CreateObject("Scripting.Dictionary") '// Loop through each row in the array For lRow = 1 To UBound(vData, 1) '// Check if value exists in the array If Not .exists(vData(lRow, 1)) Then '// Value does not exist mark as non duplicate. vArray(lRow, 0) = "x" '// Add value to dictionary .Add vData(lRow, 1), Nothing End If Next lRow End With '// Turn off screen updating to speed up code and prevent screen flicker Application.ScreenUpdating = False With ActiveSheet '// Populate column z with the array .Range("Z2").Resize(UBound(vArray, 1)) = vArray '// Use error handling as speciallcells throws an error when none exist. On Error Resume Next '// Delete all blank cells in column z .Range("Y35001", .Cells(Rows.Count, "Y").End(xlUp)).Offset(, 1).SpecialCells(xlCellTypeBlanks).EntireRow.Delete '// Remove error handling On Error GoTo 0 '// Clear columns y and z .Columns(25).Resize(, 2).ClearContents End With '// Turn screen updating back on. Application.ScreenUpdating = True End Sub 

NOTE. You can change all activesheet links to your code sheet name if you wish.

NOTE 2: it is assumed that you have headers and it only left line 1.

I use your columns and validate the data as best as possible. Here is the test fill I used:

 Sub TestFill() For i = 1 To 37000 With Range("A" & i) .value = Choose(Int(2 * Rnd + 1), "PTY", "GTY") .Offset(, 1).value = Round((40000 * (Rnd + 1)), Choose(Int(4 * Rnd + 1), 1, 2, 3, 4)) .Offset(, 5).value = Int(4 * Rnd + 1) .Offset(, 6).value = Int(2 * Rnd + 1) .Offset(, 7).value = Choose(Int(2 * Rnd + 1), "230", "540") .Offset(, 8).value = Int(3 * Rnd + 1) End With Next i End Sub 
+3
source

This is also the answer to some comments and decisions made by other members, so sorry if he does not immediately answer your question.

First, I believe that using excel in a database script allows you to separate raw data and presentation data. This usually means a single worksheet with raw data and several other worksheets with presentation data. Then delete the raw data if necessary or archive.

When checking the speed it is very difficult to get a flat playing field in excel, since there are many things that affect the results. Computer specifications, available RAM, etc. First, the code must be compiled before running any of the procedures. Test data is also important when considering duplicates โ€” how many duplicates versus the number of rows. This loads some test data by changing the number of lines and the range of random numbers (duplicates), which will give very different results for your code. I donโ€™t know what your data looks like, so we work blindly and your results may be different.

 '// This is still not very good test data, but should suffice for this situation. Sub TestFill() '// 300000 rows For i = 1 To 300000 '// This populates a random number between 1 & 10000 - adjust to suit Cells(i, "A").value = Int((100000 + 1) * Rnd + 1) Next End Sub 

If we are talking about an advanced filter against an array and a dictator method, then an advanced filter will be faster with fewer lines, but as soon as you get to a certain number of lines, then the array method will be faster. Then see what happens when you change the number of duplicates .... :) As a guide or as a general rule for using built-in functions, excels will be faster, and I recommend that you always develop an attempt to use these built-in functions, but there are often exceptions when deleting duplicates for example above. :)

Deleting rows can be slow when cycled when used incorrectly. If you are using a loop, it is important to maintain synchronization between the code and the workbook from the loop. This usually means reading the data into an array, a data cycle, then loading the data from the array back onto the presentation worksheet, essentially deleting the unwanted data.

 Sub RemoveDuplicatesA() '// Copy raw data to presentation sheet Range("A1", Cells(Rows.Count, "A").End(xlUp)).AdvancedFilter _ Action:=xlFilterCopy, CopyToRange:=Sheet2.Range("B1"), Unique:=True End Sub 

This will be the fastest way:

 Sub RemoveDuplicatesB() Dim vData As Variant, vArray As Variant Dim lCnt As Long, lRow As Long vData = ActiveSheet.UsedRange.Columns(1).value ReDim vArray(0 To UBound(vData, 1), 0) lCnt = 0 With CreateObject("Scripting.Dictionary") For lRow = 1 To UBound(vData, 1) If Not .Exists(vData(lRow, 1)) Then vArray(lCnt, 0) = vData(lRow, 1): lCnt = lCnt + 1 .Add vData(lRow, 1), Nothing End If Next lRow End With '// Copy raw data to presentation sheet Sheet2.Range("B1").Resize(lCnt).value = vArray End Sub 

Transposing the application has a limit of 65,536 lines, but when using 2003 you should be fine, so you can simplify the above code with

 Sub RemoveDuplicatesC() Dim vData As Variant Dim lRow As Long vData = ActiveSheet.UsedRange.Columns(1).value With CreateObject("Scripting.Dictionary") For lRow = 1 To UBound(vData, 1) If Not .exists(vData(lRow, 1)) Then .Add vData(lRow, 1), Nothing End If Next lRow '// Copy raw data to presentation sheet or replace raw data Sheet2.Columns(2).ClearContents Sheet2.Columns(2).Resize(.Count).value = Application.Transpose(.keys) End With End Sub 

EDIT

Ok, so @Issun mentioned that you want to delete the entire line. My suggestion was to improve the layout of spreadsheets by having raw data and a presentation sheet, which means that you do not need to delete anything, therefore, this would be the fastest method. If you do not want to do this and want to change the source data directly, try the following:

  Sub RemoveDuplicatesD() Dim vData As Variant, vArray As Variant Dim lRow As Long vData = ActiveSheet.UsedRange.Columns(1).value ReDim vArray(1 To UBound(vData, 1), 0) With CreateObject("Scripting.Dictionary") For lRow = 1 To UBound(vData, 1) If Not .exists(vData(lRow, 1)) Then varray(lRow, 0) = "x" .Add vData(lRow, 1), Nothing End If Next lRow End With Application.ScreenUpdating = False '// Modify the raw data With ActiveSheet .Columns(2).Insert .Range("B1").Resize(lRow).value = vArray .Columns(2).SpecialCells(xlCellTypeBlanks).EntireRow.Delete .Columns(2).Delete End With Application.ScreenUpdating = True End Sub 
+5
source

MORE UPDATE :

I think that the initial questions threw me away - maybe there is a problem with the logic in the question. The following assumes that you want to delete a cell, not an entire row, for duplicate entries.

  • If 35,000 old records do not contain duplicates, then all you have to do is delete all duplicates from the entire column - while you start from row 1, you do not run the risk of deleting any of the "old" rows because they do not have duplicates.

Here is one way:

 Sub UniqueList() Application.ScreenUpdating = False Dim vArray As Variant Dim i As Long, j As Long, lastrow As Long Dim dictionary As Object Set dictionary = CreateObject("scripting.dictionary") lastrow = Range("A" & Rows.Count).End(xlUp).Row vArray = Range("A1:A" & lastrow).Value On Error Resume Next For i = 1 To UBound(vArray, 1) For j = 1 To UBound(vArray, 2) If Len(vArray(i, j)) <> 0 Then dictionary(vArray(i, j)) = 1 End If Next Next Columns("A:A").ClearContents Range("A1").Resize(dictionary.Count).Value = _ Application.Transpose(dictionary.keys) Application.ScreenUpdating = True End Sub 
  • If for some odd reason, 35,000 old DO records include dupes, and you only want to allow these 35,000 records to do this, then you can use 2 dictionaries, but this will be an unusual case, since you will relate to the old one written differently, than new ...
 Sub RemoveNewDupes() Application.ScreenUpdating = False Dim lastRow As Long Dim varray As Variant Dim oldDict As Object, newDict As Object Set oldDict = CreateObject("scripting.dictionary") Set newDict = CreateObject("scripting.dictionary") On Error Resume Next lastRow = Range("A" & Rows.Count).End(xlUp).Row 'Add old entries to dictionary varray = Range("A1:A35000").Value For i = 1 To UBound(varray, 1) oldDict.Add varray(i, 1), 1 Next 'Check for dupes varray = Range("A35001:A" & lastRow).Value For i = 1 To UBound(varray, 1) If oldDict.exists(varray(i, 1)) = False Then newDict.Add varray(i, 1), 1 End If Next 'Delete and slap back on the unique list Range("A35001", "A" & Rows.Count).ClearContents Range("A35001").Resize(newDict.Count).Value = _ Application.Transpose(newDict.keys) Application.ScreenUpdating = True End Sub 

Thanks to Reafidy for the advice and making me change my mind.

+5
source

Before starting all your code from scratch, here are a few things you can try:

Optimize Your VBA There are some tips for optimizing your vba on the Internet. In particular, you can:

 'turn off some Excel functionality so your code runs faster 'these two are especially very efficient Application.ScreenUpdating = False Application.Calculation = xlCalculationManual 'use these if you really need to Application.DisplayStatusBar = False Application.EnableEvents = False 'code goes here 'at the end, restore the default behavior 'calculate the formulas Application.Calculate Application.ScreenUpdating = True Application.Calculation = xlCalculationAutomatic Application.DisplayStatusBar = True Application.EnableEvents = True 

See here for more details.

Optimize the algorithm Especially when you insert the COUNTIF formula, you can try filling out instead of inserting the formula on each line.

When deleting part of the row, you should try the solution that I gave you in the previous thread: Delete duplicate entries in the excel 2003 vba column to filter True values โ€‹โ€‹first and then delete visible cells. This is probably the fastest way.

[EDIT] It seems that Doc Brown's answer would probably be the best way to handle this (hey, this is a vocabulary solution that was not written by Issun :)). Anyway, VBA optimization tips are still relevant because it is a rather slow language.

+4
source

OK, the advancedfilter method is used here. I do not know if it is faster than the dictionary method. It would be interesting to know, so let me know after you try. I also included the deletion part, so you will need to stop this part if you want to make a true comparison. In addition, you can make this function instead of sub and put it in your variables, however you want to change it.

 Sub DeleteRepeats() Dim d1 As Double Dim r1 As Range, rKeepers As Range Dim wks As Worksheet d1 = Timer Set wks = ActiveSheet Application.EnableEvents = False Application.ScreenUpdating = False 'Make sure all rows are visible On Error Resume Next wks.ShowAllData wks.UsedRange.Rows.Hidden = False wks.UsedRange.Columns.Hidden = False On Error GoTo 0 'Get concerned range Set r1 = wks.Range("A1:A35000") 'Filter r1.AdvancedFilter Action:=xlFilterInPlace, Unique:=True 'Get range of cells not to be deleted Set rKeepers = r1.SpecialCells(xlCellTypeVisible) On Error Resume Next wks.ShowAllData On Error GoTo 0 rKeepers.EntireRow.Hidden = True 'Delete all undesirables r1.SpecialCells(xlCellTypeVisible).EntireRow.Delete 'show all rows On Error Resume Next wks.UsedRange.Rows.Hidden = False On Error GoTo 0 Application.EnableEvents = False Application.ScreenUpdating = False Debug.Print Timer() - d1 End Sub 

OK, here's a take of the word Doc and Issun in dictionaries. Before I was not convinced, but by looking at it and testing it and comparing it with an advanced filter, I am convinced that dictionaries are better suited for this application. I donโ€™t know why Excel is not accelerating at this moment, since they should use faster algorithms, this is not hiding, not showing rows, because this happens very quickly. Therefore, if anyone knows, let me know. This procedure takes about 1 second on my slow computer:

 Sub FindDupesAndDelete() Dim d1 As Double Dim dict As Object Dim sh As Worksheet Dim v1 As Variant ' Dim s1() As String Dim rDelete As Range Dim bUnion As Boolean d1 = Timer() bUnion = False Set dict = CreateObject("Scripting.Dictionary") Set sh = ActiveSheet v1 = Application.Transpose(sh.Range("A1", "A" _ & sh.Cells.SpecialCells(xlCellTypeLastCell).row)) ' ReDim s1(1 To UBound(v1)) Dim row As Long, value As String ', newEntry As Boolean For row = 1 To sh.Cells.SpecialCells(xlCellTypeLastCell).row value = v1(row) If dict.Exists(value) Then ' newEntry = False If bUnion Then Set rDelete = Union(rDelete, sh.Range("A" & row)) Else Set rDelete = sh.Range("A" & row) bUnion = True End If Else ' newEntry = True dict.Add value, 1 End If ' s1(row) = newEntry Next rDelete.EntireRow.Delete ' sh.Range("B1", "B" & UBound(v1)) = Application.Transpose(s1) Debug.Print Timer() - d1 End Sub 
+4
source

Suppose you have entries in column A, and you want to get the result of your formula in column B (but much faster). This VBA macro should do the trick:

 Option Explicit Sub FindDupes() Dim dict As Object Dim sh As Worksheet Set dict = CreateObject("Scripting.Dictionary") Set sh = ActiveSheet Dim row As Long, value As String For row = 1 To sh.Cells.SpecialCells(xlCellTypeLastCell).row value = sh.Range("A" & row).Text If dict.Exists(value) Then sh.Range("B" & row) = "False" Else sh.Range("B" & row) = "True" dict.Add value, 1 End If Next End Sub 

(Using the dictionary here gives an almost linear runtime, which should be seconds for 350,000 lines, where the original formula has the quadratic run-time complexity).

Edit: due to your comment: first you need to fill out the dictionary by reading each entry at least once, which you cannot avoid easily. You can avoid re-filling the rows of column B when they are already filled:

 Option Explicit Sub FindDupes() Dim dict As Object Dim sh As Worksheet Set dict = CreateObject("Scripting.Dictionary") Set sh = ActiveSheet Dim row As Long, value As String, newEntry As Boolean For row = 1 To sh.Cells.SpecialCells(xlCellTypeLastCell).row value = sh.Range("A" & row).Text If dict.Exists(value) Then newEntry = False Else newEntry = True dict.Add value, 1 End If If Trim(sh.Range("B" & row)) = "" Then sh.Range("B" & row) = newEntry Next End Sub 

But I suspect that it will not be much faster than my first decision.

+2
source

Now that you have updated that you want to delete all the lines and that the first 35,000 lines are allowed to be tricked, here is a function that will do it for you. I think I came up with a smart method, and it flashes quickly too:

 Sub RemoveNewDupes() Application.ScreenUpdating = False Dim lastRow As Long Dim varray As Variant Dim oldDict As Object, newDict As Object Set oldDict = CreateObject("scripting.dictionary") Set newDict = CreateObject("scripting.dictionary") On Error Resume Next lastRow = Range("A" & Rows.Count).End(xlUp).Row 'Add old entries to dictionary varray = Range("A1:A35000").Value For i = 1 To UBound(varray, 1) oldDict.Add varray(i, 1), 1 Next 'Check for dupes varray = Range("A35001:A" & lastRow).Value For i = 35000 + UBound(varray, 1) To 35001 Step -1 If oldDict.exists(varray(i - 35000, 1)) = True Or _ newDict.exists(varray(i - 35000, 1)) = True Then Range("A" & i).EntireRow.Delete Else newDict.Add varray(i - 35000, 1), 1 End If Next Application.ScreenUpdating = True 'A status message at the end for finishing touch MsgBox UBound(varray, 1) - newDict.Count & _ " duplicate row(s) found and deleted." End Sub 

How it works :

First, I store 35,000 cells in a dictionary file. Then I take the variable array of each 35001 cell forward and scroll them back to see if it is in the 35k dictionary or not, or that we have not yet met the value in the loop. If he discovers that this is a hoax, he deletes the line.

A cool (so to speak) way of deleting a line is that when you create a varray, for example A35001 - A37000, it saves them as (1, 1) (2, 1) (...). Therefore, if you set โ€œiโ€ to the Ubound array of + 35000 and return to 35001, you will go through all the additions from A37000 to A35001. Then, when you want to delete a line, "i" is perfectly tuned to the line number in which the value was found, so you can delete it. And as he goes backwards, he does not twist the loop!

+1
source

Source: https://habr.com/ru/post/1369663/


All Articles