Sort () and BinarySearch () performance comparison with integers / strings

Initially, I wanted to ask if it is faster to sort integers than strings. But I myself answered this question, and I am surprised at the big difference. Why is sorting and BinarySearch Integer so much faster than strings?

Test (VB.Net) with 1.000.000 Int32 / Strings:

Private Function CheckIntBinarySearch() As TimeSpan
    Dim watch As New System.Diagnostics.Stopwatch()
    Dim rnd As New Random(Date.Now.Millisecond)
    Dim intCol1 As New List(Of Int32)
    Dim intCol2 As New List(Of Int32)
    Dim contains As Int32
    For i As Int32 = 1 To 1000000
        intCol1.Add(rnd.Next(1, 1000000))
    Next
    For i As Int32 = 1 To 1000000
        intCol2.Add(rnd.Next(1, 1000000))
    Next
    Me.output.WriteLine("Integers sorting ...")
    watch.Start()
    intCol1.Sort()
    watch.Stop()
    Me.output.WriteLine("Sorting finished: " & watch.Elapsed.TotalSeconds & " seconds elapsed.")

    Me.output.WriteLine("Integers BinarySearch ...")
    watch.Start()
    For Each Val As Int32 In intCol2
        If intCol1.BinarySearch(Val) > -1 Then contains += 1
    Next
    watch.Stop()
    Me.output.WriteLine("BinarySearch finished(contains " & contains & "): " & watch.Elapsed.TotalSeconds & " seconds elapsed.")
    Return watch.Elapsed
End Function

Private Function CheckStringBinarySearch() As TimeSpan
    Dim watch As New System.Diagnostics.Stopwatch()
    Dim rnd As New Random(Date.Now.Millisecond)
    Dim stringCol1 As New List(Of String)
    Dim stringCol2 As New List(Of String)
    Dim contains As Int32
    For i As Int32 = 1 To 1000000
        stringCol1.Add(rnd.Next(1, 1000000).ToString)
    Next
    For i As Int32 = 1 To 1000000
        stringCol2.Add(rnd.Next(1, 1000000).ToString)
    Next
    Me.output.WriteLine("Strings sorting ...")
    watch.Start()
    stringCol1.Sort()
    watch.Stop()
    Me.output.WriteLine("Sorting finished: " & watch.Elapsed.TotalSeconds & " seconds elapsed.")
    Me.output.WriteLine("Strings BinarySearch ...")
    watch.Start()
    For Each Val As String In stringCol2
        If stringCol1.BinarySearch(Val) > -1 Then contains += 1
    Next
    watch.Stop()
    Me.output.WriteLine("BinarySearch finished(contains " & contains & "): " & watch.Elapsed.TotalSeconds & " seconds elapsed.")
    Return watch.Elapsed
End Function

Performance check 5 times:

For i As Int32 = 1 To 5
   intChecks.Add(CheckIntBinarySearch())
Next
For i As Int32 = 1 To 5
   stringChecks.Add(CheckStringBinarySearch())
Next

Conclusion:

    1.)Integers sorting ...
    Sorting finished: 0,2292863 seconds elapsed.
    Integers BinarySearch ...
    BinarySearch finished(contains 630857): 0,9365744 seconds elapsed.
    2.)Integers sorting ...
    Sorting finished: 0,2287382 seconds elapsed.
    Integers BinarySearch ...
    BinarySearch finished(contains 632600): 0,9053134 seconds elapsed.
    3.)Integers sorting ...
    Sorting finished: 0,2318829 seconds elapsed.
    Integers BinarySearch ...
    BinarySearch finished(contains 631475): 0,9038594 seconds elapsed.
    4.)Integers sorting ...
    Sorting finished: 0,2308994 seconds elapsed.
    Integers BinarySearch ...
    BinarySearch finished(contains 632346): 0,9011047 seconds elapsed.
    5.)Integers sorting ...
    Sorting finished: 0,2266423 seconds elapsed.
    Integers BinarySearch ...
    BinarySearch finished(contains 632982): 0,893541 seconds elapsed.
    1.)Strings sorting ...
    Sorting finished: 6,5661916 seconds elapsed.
    Strings BinarySearch ...
    BinarySearch finished(contains 632579): 12,9545657 seconds elapsed.
    2.)Strings sorting ...
    Sorting finished: 6,5641975 seconds elapsed.
    Strings BinarySearch ...
    BinarySearch finished(contains 631478): 13,0184132 seconds elapsed.
    3.)Strings sorting ...
    Sorting finished: 6,4281382 seconds elapsed.
    Strings BinarySearch ...
    BinarySearch finished(contains 631775): 12,7684214 seconds elapsed.
    4.)Strings sorting ...
    Sorting finished: 6,9455087 seconds elapsed.
    Strings BinarySearch ...
    BinarySearch finished(contains 631478): 13,7057234 seconds elapsed.
    5.)Strings sorting ...
    Sorting finished: 6,6707111 seconds elapsed.
    Strings BinarySearch ...
    BinarySearch finished(contains 632346): 13,0493649 seconds elapsed.
  • Int32Sort: 0.22948982 seconds
  • StringSort: 6.63494942 seconds
  • Int32BinarySearch average: 0.90807858 seconds
  • StringBinarySearch average: 13.09929772 seconds

Conclusion:

  • Sort Integer values 29 times faster than string sorting
  • BinarySearch 14,4 , BinarySearch

? "String-Integer" ( "1", "2", "3",...). ? ? , , .

+3
5

, .

  • , . , , , . , .
  • "111112" "111113". 6 ( ), . .
  • , - CPU / - , ..
  • , "ß", , "ss" "æ", , "ae", .
  • , , .

, , , .

, - , "12" < "2" ( ), , .

+4

2 , .

- , :

  • A
  • B
  • , , , .
  • A
  • B
  • , A B. ( , .)

, ,

+3

. , .

, , ,.NET , , . - , "ae" "æ" .

, :

 stringCol1.Sort(StringComparer.Ordinal)

( , ), smirkingman - .

, , Unicode Collation Algorithm.

+2

, , - .

.

, .

+1

, - sub, , , , , , , sub.

, . , , 90% .

0

Source: https://habr.com/ru/post/1774910/


All Articles