F # Performance Impact of Verified Calcs?

Question

F # Performance Impact of Verified Calcs?

Does the effect of using the module under test arise? I tested it with sequences of type int and I see no noticeable difference. Sometimes the verified version is faster, and sometimes it is not checked faster, but usually it is not.

Seq.initInfinite (fun x-> x) |> Seq.item 1000000000;;

 Real: 00:00:05.272, CPU: 00:00:05.272, GC gen0: 0, gen1: 0, gen2: 0 val it : int = 1000000000

 open Checked Seq.initInfinite (fun x-> x) |> Seq.item 1000000000;;

 Real: 00:00:04.785, CPU: 00:00:04.773, GC gen0: 0, gen1: 0, gen2: 0 val it : int = 1000000000

Basically I'm trying to figure out if there are any flaws to always open Checked. (I ran into an overflow that didn’t immediately become apparent, so now I play the role of an instigated lover who does not want another broken heart.) The only unreasonable reason I can come up with, not always using Checked if there are some performance hits, but I not seen them yet.

+5

performance f # integer-overflow

Talmage Mar 30 '17 at 17:27

source share

1 answer

FuleSnabel · Accepted Answer · 2017-04-01T17:44:05+0000

When measuring performance, it is usually not recommended to include Seq , because Seq adds a lot of overhead (at least compared to int operations), so you risk that most of the time is spent on Seq , and not in the code that you want to test.

I wrote a small test program for (+) :

 let clock = let sw = System.Diagnostics.Stopwatch () sw.Start () fun () -> sw.ElapsedMilliseconds let dbreak () = System.Diagnostics.Debugger.Break () let time a = let b = clock () let r = a () let n = clock () let d = n - b d, r module Unchecked = let run c () = let rec loop ai = if i < c then loop (a + 1) (i + 1) else a loop 0 0 module Checked = open Checked let run c () = let rec loop ai = if i < c then loop (a + 1) (i + 1) else a loop 0 0 [<EntryPoint>] let main argv = let count = 1000000000 let testCases = [| "Unchecked" , Unchecked.run "Checked" , Checked.run |] for nm, a in testCases do printfn "Running %s ..." nm let ms, r = time (a count) printfn "... it took %d ms, result is %A" ms r 0

The performance results are as follows:

 Running Unchecked ... ... it took 561 ms, result is 1000000000 Running Checked ... ... it took 1103 ms, result is 1000000000

Thus, it seems that some overhead has been added using Verified. The int add cost should be less than the cycle overhead, so Checked overhead is higher than 2x , possibly closer to 4x .

Out of curiosity, we can test the IL code with tools like ILSpy :

Overflow:

  IL_0000: nop IL_0001: ldarg.2 IL_0002: ldarg.0 IL_0003: bge.s IL_0014 IL_0005: ldarg.0 IL_0006: ldarg.1 IL_0007: ldc.i4.1 IL_0008: add IL_0009: ldarg.2 IL_000a: ldc.i4.1 IL_000b: add IL_000c: starg.si IL_000e: starg.sa IL_0010: starg.sc IL_0012: br.s IL_0000

Checked by:

  IL_0000: nop IL_0001: ldarg.2 IL_0002: ldarg.0 IL_0003: bge.s IL_0014 IL_0005: ldarg.0 IL_0006: ldarg.1 IL_0007: ldc.i4.1 IL_0008: add.ovf IL_0009: ldarg.2 IL_000a: ldc.i4.1 IL_000b: add.ovf IL_000c: starg.si IL_000e: starg.sa IL_0010: starg.sc IL_0012: br.s IL_0000

The only difference is that Unchecked uses add and Checked uses add.ovf . add.ovf added with overflow checking.

We can dig even deeper by looking at the jitted x86_64 code.

Overflow:

 ; if i < c then 00007FF926A611B3 cmp esi,ebx 00007FF926A611B5 jge 00007FF926A611BD ; i + 1 00007FF926A611B7 inc esi ; a + 1 00007FF926A611B9 inc edi ; loop (a + 1) (i + 1) 00007FF926A611BB jmp 00007FF926A611B3

Checked by:

 ; if i < c then 00007FF926A62613 cmp esi,ebx 00007FF926A62615 jge 00007FF926A62623 ; a + 1 00007FF926A62617 add edi,1 ; Overflow? 00007FF926A6261A jo 00007FF926A6262D ; i + 1 00007FF926A6261C add esi,1 ; Overflow? 00007FF926A6261F jo 00007FF926A6262D ; loop (a + 1) (i + 1) 00007FF926A62621 jmp 00007FF926A62613

Now the reason for Checked service messages is visible. After each operation, the jitter inserts a jo conditional statement that jumps to code that raises an OverflowException if the overflow flag is set.

This chart shows us that the cost of an integer addition is less than 1 clock cycle. The reason this is less than 1 clock cycle is because a modern processor can execute certain instructions in parallel.

The diagram also shows that the branch that was correctly predicted by the processor takes about 1-2 clock cycles.

Thus, assuming a throughput of at least 2, the cost of two whole additions in the Unchecked example should be 1 clock cycle.

In the tested example, we execute add, jo, add, jo . Most likely, the CPU cannot be parallelized in this case, and the cost of this should be about 4-6 cycles.

Another interesting difference is that the order of additions has changed. With proven add-ons, the order of operations matters, but with uncontrolled jitter (and a processor) it has more flexibility, moving operations, possibly improving performance.

Shortly speaking; for low-cost operations such as (+) Checked overhead should be approximately 4x-6x compared to Unchecked .

This eliminates the overflow exception. The cost of a .NET exception is probably around 100,000x times more expensive than an integer add.

F # Performance Impact of Verified Calcs?

More articles: