Solving Knapsack Problem in F #: Performance

Question

Solving Knapsack Problem in F #: Performance

I found an article:
Solving backpack problem 0-1 using continue transmission style with memoization in F #

about the knapsack problem implemented in F #. When I learn this language, I found it really interesting and tried to investigate it a bit. Here is the code I created:

open System open System.IO open System.Collections.Generic let parseToTuple (line : string) = let parsedLine = line.Split(' ') |> Array.filter(not << String.IsNullOrWhiteSpace) |> Array.map Int32.Parse (parsedLine.[0], parsedLine.[1]) let memoize f = let cache = Dictionary<_, _>() fun x -> if cache.ContainsKey(x) then cache.[x] else let res = fx cache.[x] <- res res type Item = { Value : int Size : int } type ContinuationBuilder() = member b.Bind(x, f) = fun k -> x (fun x -> fxk) member b.Return x = fun k -> kx member b.ReturnFrom x = x let cont = ContinuationBuilder() let set1 = [ (4, 11) (8, 4) (10, 5) (15, 8) (4, 3) ] let set2 = [ (50, 341045); (1906, 4912); (41516, 99732); (23527, 56554); (559, 1818); (45136, 108372); (2625, 6750); (492, 1484) (1086, 3072); (5516, 13532); (4875, 12050); (7570, 18440); (4436, 10972); (620, 1940); (50897, 122094); (2129, 5558) (4265, 10630); (706, 2112); (2721, 6942); (16494, 39888); (29688, 71276); (3383, 8466); (2181, 5662); (96601, 231302) (1795, 4690); (7512, 18324); (1242, 3384); (2889, 7278); (2133, 5566); (103, 706); (4446, 10992); (11326, 27552) (3024, 7548); (217, 934); (13269, 32038); (281, 1062); (77174, 184848); (952, 2604); (15572, 37644); (566, 1832) (4103, 10306); (313, 1126); (14393, 34886); (1313, 3526); (348, 1196); (419, 1338); (246, 992); (445, 1390) (23552, 56804); (23552, 56804); (67, 634) ] [<EntryPoint>] let main args = // prepare list of items from a file args.[0] let header, items = set1 |> function | h::t -> h, t | _ -> raise (Exception("Wrong data format")) let N, K = header printfn "N = %d, K = %d" NK let items = List.map (fun x -> {Value = fst x ; Size = snd x}) items |> Array.ofList let rec combinations = let innerSolver key = cont { match key with | (i, k) when i = 0 || k = 0 -> return 0 | (i, k) when items.[i-1].Size > k -> return! combinations (i-1, k) | (i, k) -> let item = items.[i-1] let! v1 = combinations (i-1, k) let! beforeItem = combinations (i-1, k-item.Size) let v2 = beforeItem + item.Value return max v1 v2 } memoize innerSolver let res = combinations (N, K) id printfn "%d" res 0

However, the problem with this implementation is that it is veeeery slow (in practice, I cannot solve the problem with 50 elements and a capacity of ~ 300000, which are solved by my naive C # implementation in less than 1 second).

Could you tell me that I was mistaken somewhere? Or maybe the implementation is correct, and this is simply an inefficient way to solve this problem.

+6

performance optimization f # knapsack-problem

Alojzy Leszcz Jul 03 '13 at 10:40

source share

2 answers

When you naively apply a generic memoizer, like this one, and use a transfer continuation, the values in your cache memoization continue , not the regular "final" results. Thus, when you get a cache hit, you do not return the final result, you return some function that promises calculates the result when it is called. This call can be costly, it can cause various other continuations, it can end up in the memoization cache again, etc.

It is effective to memorize transfer continuation functions, such that: a) caching works completely, and b) the function remains tail recursive, it is quite difficult. Read this discussion and come back when you fully understand it all .; -)

The author of the blog post you are associated with uses a more complex, less general memoizer specifically designed for this problem. Admittedly, I still do not fully understand it (the code on the blog is incomplete / broken, it is so difficult to try it), but I believe that the essence of it is that it “forces” the chain of sequels to cache the final integer result.

To illustrate this point, here is a quick refactoring of your code, which is completely autonomous and tracks relevant information:

 open System open System.Collections.Generic let mutable cacheHits = 0 let mutable cacheMisses = 0 let memoize f = let cache = Dictionary<_, _>() fun x -> match cache.TryGetValue(x) with | (true, v) -> cacheHits <- cacheHits + 1 printfn "Hit for %A - Result is %A" xv v | _ -> cacheMisses <- cacheMisses + 1 printfn "Miss for %A" x let res = fx cache.[x] <- res res type Item = { Value : int; Size : int } type ContinuationBuilder() = member b.Bind(x, f) = fun k -> x (fun x -> fxk) member b.Return x = fun k -> kx member b.ReturnFrom x = x let cont = ContinuationBuilder() let genItems n = [| for i = 1 to n do let size = i % 5 let value = (size * i) yield { Value = value; Size = size } |] let N, K = (5, 100) printfn "N = %d, K = %d" NK let items = genItems N let rec combinations_cont = memoize ( fun key -> cont { match key with | (0, _) | (_, 0) -> return 0 | (i, k) when items.[i-1].Size > k -> return! combinations_cont (i - 1, k) | (i, k) -> let item = items.[i-1] let! v1 = combinations_cont (i-1, k) let! beforeItem = combinations_cont (i-1, k - item.Size) let v2 = beforeItem + item.Value return max v1 v2 } ) let res = combinations_cont (N, K) id printfn "Answer: %d" res printfn "Memo hits: %d" cacheHits printfn "Memo misses: %d" cacheMisses printfn "" let rec combinations_plain = memoize ( fun key -> match key with | (i, k) when i = 0 || k = 0 -> 0 | (i, k) when items.[i-1].Size > k -> combinations_plain (i-1, k) | (i, k) -> let item = items.[i-1] let v1 = combinations_plain (i-1, k) let beforeItem = combinations_plain (i-1, k-item.Size) let v2 = beforeItem + item.Value max v1 v2 ) cacheHits <- 0 cacheMisses <- 0 let res2 = combinations_plain (N, K) printfn "Answer: %d" res2 printfn "Memo hits: %d" cacheHits printfn "Memo misses: %d" cacheMisses

As you can see, the CPS version is the caching of continuations (not integers), and there are many additional actions that come to an end when continuations are called.

If you increase the size of the problem to let (N, K) = (20, 100) (and remove the printfn in memoizer), you will see that the CPS version finishes executing more than 1 million cache requests, compared to a simple version that only performs a few one hundred.

+7

latkin Jul 04 '13 at 0:02

source share

Zachbray · Accepted Answer · 2013-07-04T07:01:01+0000

From running this code in FSI:

 open System open System.Diagnostics open System.Collections.Generic let time f = System.GC.Collect() let sw = Stopwatch.StartNew() let r = f() sw.Stop() printfn "Took: %f" sw.Elapsed.TotalMilliseconds r let mutable cacheHits = 0 let mutable cacheMisses = 0 let memoize f = let cache = Dictionary<_, _>() fun x -> match cache.TryGetValue(x) with | (true, v) -> cacheHits <- cacheHits + 1 //printfn "Hit for %A - Result is %A" xv v | _ -> cacheMisses <- cacheMisses + 1 //printfn "Miss for %A" x let res = fx cache.[x] <- res res type Item = { Value : int; Size : int } type ContinuationBuilder() = member b.Bind(x, f) = fun k -> x (fun x -> fxk) member b.Return x = fun k -> kx member b.ReturnFrom x = x let cont = ContinuationBuilder() let genItems n = [| for i = 1 to n do let size = i % 5 let value = (size * i) yield { Value = value; Size = size } |] let N, K = (80, 400) printfn "N = %d, K = %d" NK let items = genItems N //let rec combinations_cont = // memoize ( // fun key -> // cont { // match key with // | (0, _) | (_, 0) -> return 0 // | (i, k) when items.[i-1].Size > k -> return! combinations_cont (i - 1, k) // | (i, k) -> let item = items.[i-1] // let! v1 = combinations_cont (i-1, k) // let! beforeItem = combinations_cont (i-1, k - item.Size) // let v2 = beforeItem + item.Value // return max v1 v2 // } // ) // // //cacheHits <- 0 //cacheMisses <- 0 //let res = time(fun () -> combinations_cont (N, K) id) //printfn "Answer: %d" res //printfn "Memo hits: %d" cacheHits //printfn "Memo misses: %d" cacheMisses //printfn "" let rec combinations_plain = memoize ( fun key -> match key with | (i, k) when i = 0 || k = 0 -> 0 | (i, k) when items.[i-1].Size > k -> combinations_plain (i-1, k) | (i, k) -> let item = items.[i-1] let v1 = combinations_plain (i-1, k) let beforeItem = combinations_plain (i-1, k-item.Size) let v2 = beforeItem + item.Value max v1 v2 ) cacheHits <- 0 cacheMisses <- 0 printfn "combinations_plain" let res2 = time (fun () -> combinations_plain (N, K)) printfn "Answer: %d" res2 printfn "Memo hits: %d" cacheHits printfn "Memo misses: %d" cacheMisses printfn "" let recursivelyMemoize f = let cache = Dictionary<_, _>() let rec memoizeAux x = match cache.TryGetValue(x) with | (true, v) -> cacheHits <- cacheHits + 1 //printfn "Hit for %A - Result is %A" xv v | _ -> cacheMisses <- cacheMisses + 1 //printfn "Miss for %A" x let res = f memoizeAux x cache.[x] <- res res memoizeAux let combinations_plain2 = let combinations_plain2Aux combinations_plain2Aux key = match key with | (i, k) when i = 0 || k = 0 -> 0 | (i, k) when items.[i-1].Size > k -> combinations_plain2Aux (i-1, k) | (i, k) -> let item = items.[i-1] let v1 = combinations_plain2Aux (i-1, k) let beforeItem = combinations_plain2Aux (i-1, k-item.Size) let v2 = beforeItem + item.Value max v1 v2 let memoized = recursivelyMemoize combinations_plain2Aux fun x -> memoized x cacheHits <- 0 cacheMisses <- 0 printfn "combinations_plain2" let res3 = time (fun () -> combinations_plain2 (N, K)) printfn "Answer: %d" res3 printfn "Memo hits: %d" cacheHits printfn "Memo misses: %d" cacheMisses printfn "" let recursivelyMemoizeCont f = let cache = Dictionary HashIdentity.Structural let rec memoizeAux xk = match cache.TryGetValue(x) with | (true, v) -> cacheHits <- cacheHits + 1 //printfn "Hit for %A - Result is %A" xv kv | _ -> cacheMisses <- cacheMisses + 1 //printfn "Miss for %A" x f memoizeAux x (fun y -> cache.[x] <- y ky) memoizeAux let combinations_cont2 = let combinations_cont2Aux combinations_cont2Aux key = cont { match key with | (0, _) | (_, 0) -> return 0 | (i, k) when items.[i-1].Size > k -> return! combinations_cont2Aux (i - 1, k) | (i, k) -> let item = items.[i-1] let! v1 = combinations_cont2Aux (i-1, k) let! beforeItem = combinations_cont2Aux (i-1, k - item.Size) let v2 = beforeItem + item.Value return max v1 v2 } let memoized = recursivelyMemoizeCont combinations_cont2Aux fun x -> memoized x id cacheHits <- 0 cacheMisses <- 0 printfn "combinations_cont2" let res4 = time (fun () -> combinations_cont2 (N, K)) printfn "Answer: %d" res4 printfn "Memo hits: %d" cacheHits printfn "Memo misses: %d" cacheMisses printfn ""

I get the following results:

 N = 80, K = 400 combinations_plain Took: 7.191000 Answer: 6480 Memo hits: 6231 Memo misses: 6552 combinations_plain2 Took: 6.310800 Answer: 6480 Memo hits: 6231 Memo misses: 6552 combinations_cont2 Took: 17.021200 Answer: 6480 Memo hits: 6231 Memo misses: 6552

combinations_plain is latkin's answer.
combinations_plain2 explicitly returns a recursive memoization step.
combinations_cont2 adapts the recursive memoization function to a function that remembers the results of the continuation.
combinations_cont2 works by intercepting the result in a continuation before passing it to the actual continuation. Subsequent calls on the same key provide a continuation, and this continuation receives a response that we intercepted initially.

This shows that we can:

Memorize using the style of continuing the passage.
Reach similar characteristics (ish) for the vanilla memoized version.

Hope this makes things a bit easier. Sorry, my blog code snippet was incomplete (I think I might have lost it when reformatting recently).

Solving Knapsack Problem in F #: Performance

More articles: