Haskell: List v. Array, performance difference

Question

Haskell: List v. Array, performance difference

Another question is from Haskell n00b.

I compare the effectiveness of various methods used to solve Problem 14 on the Project Euler website. In particular, I hope to better understand the factors that influence the difference in assessment time for four (slightly) different approaches to solving the problem.

(Description of problem No. 14 and various approaches below).

Firstly, a brief overview of problem number 14. It relates to “Collatz numbers” (i.e., the same programming exercise as the previous post, which explored another aspect of Haskell). The number of Collatz for a given integer is equal to the length of the Collatz sequence for that integer. The Collatz sequence for an integer is calculated as follows: the first number ("n0") in the sequence is an integer; if n0 is even, the next number in the sequence ("n1") is n / 2; if n0 is odd, then n1 is 3 * n0 + 1. We continue to expand the sequence recursively until we get 1, after which the sequence is completed. For example, the collatz sequence for 5 is: {5, 16, 8, 4, 2, 1} (because 16 = 3 * 5 + 1, 8 = 16/2, 4 = 8/2, ...).

Problem 14 requires us to find the integer below 1,000,000, which has the largest number of Collatz. To do this, we can consider the function "collatz", which when passing the integer "n" as an argument returns the integer n with the largest number of Collatz. In other words, p 1,000,000 gives us the answer to problem number 14.

For the purposes of this exercise (that is, understanding the differences in the timing of the assessment), we can consider versions of Halkell 'collatz' that vary in two dimensions:

(1) Implementation: Do we save the data set of Collatz numbers (which will be generated for all integers 1..n) as a list or an array? I call this the "implementation" of measurement, i.e. The implementation of the function is either a list or an array.

(2) Algorithm: can we compute the Collatz number for any given integer n by expanding the Collatz sequence until it is complete (i.e., until it reaches 1)? Or do we just expand the sequence until we reach a number k that is less than n (at which point can we just use the Collatz k number that we have already calculated)? I call this the “algorithmic” dimension, i.e. The functional algorithm is either “complete” (calculating the Collatz number for each integer) or “partial”. The latter, obviously, requires fewer operations.

The following are four possible versions of the collatz function: array / partial, list / partial, array / complete and list / complete:

import Data.Array ( (!) , listArray , assocs ) import Data.Ord ( comparing ) import Data.List ( maximumBy ) --array implementation; partial algorithm (FEWEST OPERATIONS) collatzAP x = maximumBy (comparing snd) $ assocs a where a = listArray (0,x) (0:1:[cnn | n <- [2..x]]) cni = let z = if even i then div i 2 else 3*i+1 in if i < n then a ! i else 1 + cnz --list implementation; partial algorithm collatzLP x = maximum a where a = zip (0:1:[cnn | n <- [2..x]]) [0..x] cni = let z = if even i then div i 2 else 3*i+1 in if i < n then fst (a!!i) else 1 + cnz --array implementation, complete algorithm collatzAC x = maximumBy (comparing snd) $ assocs a where a = listArray (0,x) (0:1:[cnn | n <- [2..x]]) cni = let z = if even i then div i 2 else 3*i+1 in if i == 1 then 1 else 1 + cnz --list implementation, complete algorithm (MOST OPERATIONS) collatzLC x = maximum a where a = zip (0:1:[cnn | n <- [2..x]]) [0..x] cni = let z = if even i then div i 2 else 3*i+1 in if i == 1 then 1 else 1 + cnz

As for the evaluation speed: I know that arrays are much faster than lists (i.e. access time O (1) versus O (n) for a given index n), so I expected that the implementation of the collatz array would be faster than the implementation of "list", ceteris paribus. In addition, I expected the “partial” algorithm to be faster than the “full” algorithm (ceteris paribus), since it needs to perform fewer operations to build a data set of Collatz numbers.

Testing our four functions on different inputs of different sizes, we observe the following evaluation time (comments below):

It is true that the version of "array / partial" is the fastest version of "collatz" (with a good margin). However, I find it a bit counter-intuitive that "list / complete" is not the slowest version. This honor belongs to "list / partial", which is more than 20 times slower than "list / complete"!

My question is: is the complete difference in the evaluation time between "list / partial" and "list / complete" (compared to "array / partial" and "array / complete") due to the difference in access efficiency between lists and arrays in Haskell ? Or am I not doing a “controlled experiment” (i.e. are there other factors)?

+5

performance arrays list haskell

iceman Oct 3 '14 at 18:26

source share

1 answer

Arunasr · Accepted Answer · 2014-10-03T18:40:23+0000

I do not understand how the question of the relative performance of the two algorithms that work with lists is generally related to arrays ... but here's my trick:

Try to avoid indexing lists, especially long lists, if performance is of any concern. Indexing is really a workaround (as you know). List / partial indexes / traverses a lot. List / full no. Therefore, the difference between Array / complete and List / complete is negligible, and the difference between "list / partial" and the rest is huge.

Haskell: List v. Array, performance difference

More articles: