Executing seq <int> vs Lazy <LazyList <int>> in F #

Question

Executing seq <int> vs Lazy <LazyList <int>> in F #

There is a well-known solution for generating an infinite stream of Hamming numbers (i.e., all natural numbers n , where n = 2^i * 3^j * 5^k ). I implemented this in two ways in F #. The first method uses seq<int> . The solution is elegant, but the performance is terrible. The second method uses a custom type where the tail is wrapped in Lazy<LazyList<int>> . The solution is inconvenient, but the performance is amazing.

Can someone explain why the performance using seq<int> so bad and if there is a way to fix it? Thanks.

Method 1 using seq<int> .

 // 2-way merge with deduplication let rec (-|-) (xs: seq<int>) (ys: seq<int>) = let x = Seq.head xs let y = Seq.head ys let xstl = Seq.skip 1 xs let ystl = Seq.skip 1 ys if x < y then seq { yield x; yield! xstl -|- ys } elif x > y then seq { yield y; yield! xs -|- ystl } else seq { yield x; yield! xstl -|- ystl } let rec hamming: seq<int> = seq { yield 1 let xs = Seq.map ((*) 2) hamming let ys = Seq.map ((*) 3) hamming let zs = Seq.map ((*) 5) hamming yield! xs -|- ys -|- zs } [<EntryPoint>] let main argv = Seq.iter (printf "%d, ") <| Seq.take 100 hamming 0

Method 2 using Lazy<LazyList<int>> .

 type LazyList<'a> = Cons of 'a * Lazy<LazyList<'a>> // Map `f` over an infinite lazy list let rec inf_map f (Cons(x, g)) = Cons(fx, lazy(inf_map f (g.Force()))) // 2-way merge with deduplication let rec (-|-) (Cons(x, f) as xs) (Cons(y, g) as ys) = if x < y then Cons(x, lazy(f.Force() -|- ys)) elif x > y then Cons(y, lazy(xs -|- g.Force())) else Cons(x, lazy(f.Force() -|- g.Force())) let rec hamming = Cons(1, lazy(let xs = inf_map ((*) 2) hamming let ys = inf_map ((*) 3) hamming let zs = inf_map ((*) 5) hamming xs -|- ys -|- zs)) [<EntryPoint>] let main args = let a = ref hamming let i = ref 0 while !i < 100 do match !a with | Cons (x, f) -> printf "%d, " x a := f.Force() i := !i + 1 0

+6

f # infinite seq hamming-numbers

Fysx Jul 6 '14 at 20:32

source share

3 answers

Jack P. · Answer 1 · 2014-07-06T21:10:04+0000

Ganesha is right in that you evaluate the sequence several times. Seq.cache will help improve performance, but you will get much better performance from LazyList because the base sequence is only evaluated once and then cached, so it can be moved much faster. This is actually a good example of where a LazyList should be used over regular seq .

It seems that there is some significant overhead associated with using Seq.map . I believe that the compiler allocates a closure every time it is called there. Instead, I replaced your seq code with seq expressions, and it's about 1/3 faster than the original for the first 40 numbers in the sequence:

 let rec hamming: seq<int> = seq { yield 1 let xs = seq { for x in hamming do yield x * 2 } let ys = seq { for x in hamming do yield x * 3 } let zs = seq { for x in hamming do yield x * 5 } yield! xs -|- ys -|- zs }

My ExtCore library includes a LazyList calculation LazyList that works just like seq , so you can simplify your code like this

 // 2-way merge with deduplication let rec (-|-) (xs: LazyList<'T>) (ys: LazyList<'T>) = let x = LazyList.head xs let y = LazyList.head ys let xstl = LazyList.skip 1 xs let ystl = LazyList.skip 1 ys if x < y then lazyList { yield x; yield! xstl -|- ys } elif x > y then lazyList { yield y; yield! xs -|- ystl } else lazyList { yield x; yield! xstl -|- ystl } let rec hamming : LazyList<uint64> = lazyList { yield 1UL let xs = LazyList.map ((*) 2UL) hamming let ys = LazyList.map ((*) 3UL) hamming let zs = LazyList.map ((*) 5UL) hamming yield! xs -|- ys -|- zs } [<EntryPoint>] let main argv = let watch = Stopwatch.StartNew () hamming |> LazyList.take 2000 |> LazyList.iter (printf "%d, ") watch.Stop () printfn "" printfn "Elapsed time: %.4fms" watch.Elapsed.TotalMilliseconds System.Console.ReadKey () |> ignore 0 // Return an integer exit code

(NOTE: I also made your (-|-) generic function and modified hamming to use 64-bit unsigned ints, because 32-bit signed ints overflowed after the bit). This code goes through the first 2000 sequence elements on my machine in ~ 450 ms; the first 10,000 elements take ~ 3,500 ms.

Ganesh sittampalam · Answer 2 · 2014-07-06T20:35:43+0000

Your seq for hamming reevaluated from the beginning on every recursive call. Seq.cache is the help:

 let rec hamming: seq<int> = seq { yield 1 let xs = Seq.map ((*) 2) hamming let ys = Seq.map ((*) 3) hamming let zs = Seq.map ((*) 5) hamming yield! xs -|- ys -|- zs } |> Seq.cache

However, as you point out, LazyList is still much better on large inputs, even if each individual sequence is cached.

I'm not quite sure why they differ by more than a small constant factor, but it might be better to focus on making LazyList less ugly. Writing something to convert it to seq makes it a lot nicer:

 module LazyList = let rec toSeq l = match l with | Cons (x, xs) -> seq { yield x yield! toSeq xs.Value }

Then you can directly use your simple main . Also, there is no need to use a mutation to handle LazyList , you could just do it recursively.

The definition doesn’t look so bad, although lazy and Force() clutter it up a bit. It looks a little better if you use .Value instead of .Force() . You can also define a calculation LazyList for LazyList , similar to seq , to restore really good syntax, although I'm not sure it's worth the effort.

jyoung · Answer 3 · 2014-07-15T14:53:32+0000

Here is the basic version of the sequence with the best performance.

 let hamming = let rec loop nextHs = seq { let h = nextHs |> Set.minElement yield h yield! nextHs |> Set.remove h |> Set.add (h*2) |> Set.add (h*3) |> Set.add (h*5) |> loop } Set.empty<int> |> Set.add 1 |> loop

Executing seq <int> vs Lazy <LazyList <int>> in F #

More articles: