Optimizing the performance of numeric arrays in Haskell

I am working on a terrain generation algorithm for a world like MineCraft. I am currently using simplex noise based on the implementation in the article "Simplex Noise Demystified" [PDF] , because it is assumed that simplex noise is faster and have fewer artifacts than Perlin noise. It looks pretty decent (see image), but so far it is also pretty slow.

enter image description here

Launching the noise function 10 times (I need noise with different wavelengths for things like surface height, temperature, location of trees, etc.) with 3 octaves of noise for each block in a piece (16x16x128 blocks) or about 1 million calls per noisy function in general, takes about 700-800 ms. This is at least an order of magnitude slower in order to create a terrain at any decent speed, despite the fact that the algorithm does not have obvious expensive operations (at least for me). Just gender, modulo, some search engines and basic arithmetic. The algorithm (written in Haskell) is given below. SCC comments are for profiling. I omitted the 2D noise functions, as they work the same.

g3 :: (Floating a, RealFrac a) => a g3 = 1/6 {-# INLINE int #-} int :: (Integral a, Num b) => a -> b int = fromIntegral grad3 :: (Floating a, RealFrac a) => V.Vector (a,a,a) grad3 = V.fromList $ [(1,1,0),(-1, 1,0),(1,-1, 0),(-1,-1, 0), (1,0,1),(-1, 0,1),(1, 0,-1),(-1, 0,-1), (0,1,1),( 0,-1,1),(0, 1,-1),( 0,-1,-1)] {-# INLINE dot3 #-} dot3 :: Num a => (a, a, a) -> a -> a -> a -> a dot3 (a,b,c) xyz = a * x + b * y + c * z {-# INLINE fastFloor #-} fastFloor :: RealFrac a => a -> Int fastFloor x = truncate (if x > 0 then x else x - 1) --Generate a random permutation for use in the noise functions perm :: Int -> Permutation perm seed = V.fromList . concat . replicate 2 . shuffle' [0..255] 256 $ mkStdGen seed --Generate 3D noise between -0.5 and 0.5 simplex3D :: (Floating a, RealFrac a) => Permutation -> a -> a -> a -> a simplex3D pxyz = {-# SCC "out" #-} 16 * (n gi0 (x0,y0,z0) + n gi1 xyz1 + n gi2 xyz2 + n gi3 xyz3) where (i,j,k) = {-# SCC "ijk" #-} (sx, sy, sz) where sa = fastFloor (a + (x + y + z) / 3) (x0,y0,z0) = {-# SCC "x0-z0" #-} (x - int i + t, y - int j + t, z - int k + t) where t = int (i + j + k) * g3 (i1,j1,k1,i2,j2,k2) = {-# SCC "i1-k2" #-} if x0 >= y0 then if y0 >= z0 then (1,0,0,1,1,0) else if x0 >= z0 then (1,0,0,1,0,1) else (0,0,1,1,0,1) else if y0 < z0 then (0,0,1,0,1,1) else if x0 < z0 then (0,1,0,0,1,1) else (0,1,0,1,1,0) xyz1 = {-# SCC "xyz1" #-} (x0 - int i1 + g3, y0 - int j1 + g3, z0 - int k1 + g3) xyz2 = {-# SCC "xyz2" #-} (x0 - int i2 + 2*g3, y0 - int j2 + 2*g3, z0 - int k2 + 2*g3) xyz3 = {-# SCC "xyz3" #-} (x0 - 1 + 3*g3, y0 - 1 + 3*g3, z0 - 1 + 3*g3) (ii,jj,kk) = {-# SCC "iijjkk" #-} (i .&. 255, j .&. 255, k .&. 255) gi0 = {-# SCC "gi0" #-} mod (p V.! (ii + p V.! (jj + p V.! kk ))) 12 gi1 = {-# SCC "gi1" #-} mod (p V.! (ii + i1 + p V.! (jj + j1 + p V.! (kk + k1)))) 12 gi2 = {-# SCC "gi2" #-} mod (p V.! (ii + i2 + p V.! (jj + j2 + p V.! (kk + k2)))) 12 gi3 = {-# SCC "gi3" #-} mod (p V.! (ii + 1 + p V.! (jj + 1 + p V.! (kk + 1 )))) 12 {-# INLINE n #-} n gi (x',y',z') = {-# SCC "n" #-} (\a -> if a < 0 then 0 else a*a*a*a*dot3 (grad3 V.! gi) x' y' z') $ 0.6 - x'*x' - y'*y' - z'*z' harmonic :: (Num a, Fractional a) => Int -> (a -> a) -> a harmonic octaves noise = f octaves / (2 - 1 / int (2 ^ (octaves - 1))) where f 0 = 0 fo = let r = int $ 2 ^ (o - 1) in noise r / r + f (o - 1) --Generate harmonic 3D noise between -0.5 and 0.5 harmonicNoise3D :: (RealFrac a, Floating a) => Permutation -> Int -> a -> a -> a -> a -> a harmonicNoise3D p octaves lxyz = harmonic octaves (\f -> simplex3D p (x * f / l) (y * f / l) (z * f / l)) 

For profiling, I used the following code,

 q _ = let p = perm 0 in sum [harmonicNoise3D p 3 lxyz :: Float | l <- [1..10], y <- [0..127], x <- [0..15], z <- [0..15]] main = do start <- getCurrentTime print $ q () end <- getCurrentTime print $ diffUTCTime end start 

which gives the following information:

 COST CENTRE MODULE %time %alloc simplex3D Main 18.8 21.0 n Main 18.0 19.6 out Main 10.1 9.2 harmonicNoise3D Main 9.8 4.5 harmonic Main 6.4 5.8 int Main 4.0 2.9 gi3 Main 4.0 3.0 xyz2 Main 3.5 5.9 gi1 Main 3.4 3.4 gi0 Main 3.4 2.7 fastFloor Main 3.2 0.6 xyz1 Main 2.9 5.9 ijk Main 2.7 3.5 gi2 Main 2.7 3.3 xyz3 Main 2.6 4.1 iijjkk Main 1.6 2.5 dot3 Main 1.6 0.7 

To compare, I also ported the algorithm to C #. Performance there is about 3-4 times faster, so I think I should be doing something wrong. But even then it is not as fast as we would like. So my question is this: can someone tell me if there are ways to speed up my implementation and / or the algorithm in general, or does anyone know of another noise algorithm that has better performance characteristics but a similar look?

Update:

After fulfilling some of the suggestions below, the code looks as follows:

 module Noise ( Permutation, perm , noise3D, simplex3D ) where import Data.Bits import qualified Data.Vector.Unboxed as UV import System.Random import System.Random.Shuffle type Permutation = UV.Vector Int g3 :: Double g3 = 1/6 {-# INLINE int #-} int :: Int -> Double int = fromIntegral grad3 :: UV.Vector (Double, Double, Double) grad3 = UV.fromList $ [(1,1,0),(-1, 1,0),(1,-1, 0),(-1,-1, 0), (1,0,1),(-1, 0,1),(1, 0,-1),(-1, 0,-1), (0,1,1),( 0,-1,1),(0, 1,-1),( 0,-1,-1)] {-# INLINE dot3 #-} dot3 :: (Double, Double, Double) -> Double -> Double -> Double -> Double dot3 (a,b,c) xyz = a * x + b * y + c * z {-# INLINE fastFloor #-} fastFloor :: Double -> Int fastFloor x = truncate (if x > 0 then x else x - 1) --Generate a random permutation for use in the noise functions perm :: Int -> Permutation perm seed = UV.fromList . concat . replicate 2 . shuffle' [0..255] 256 $ mkStdGen seed --Generate 3D noise between -0.5 and 0.5 noise3D :: Permutation -> Double -> Double -> Double -> Double noise3D pxyz = 16 * (n gi0 (x0,y0,z0) + n gi1 xyz1 + n gi2 xyz2 + n gi3 xyz3) where (i,j,k) = (sx, sy, sz) where sa = fastFloor (a + (x + y + z) / 3) (x0,y0,z0) = (x - int i + t, y - int j + t, z - int k + t) where t = int (i + j + k) * g3 (i1,j1,k1,i2,j2,k2) = if x0 >= y0 then if y0 >= z0 then (1,0,0,1,1,0) else if x0 >= z0 then (1,0,0,1,0,1) else (0,0,1,1,0,1) else if y0 < z0 then (0,0,1,0,1,1) else if x0 < z0 then (0,1,0,0,1,1) else (0,1,0,1,1,0) xyz1 = (x0 - int i1 + g3, y0 - int j1 + g3, z0 - int k1 + g3) xyz2 = (x0 - int i2 + 2*g3, y0 - int j2 + 2*g3, z0 - int k2 + 2*g3) xyz3 = (x0 - 1 + 3*g3, y0 - 1 + 3*g3, z0 - 1 + 3*g3) (ii,jj,kk) = (i .&. 255, j .&. 255, k .&. 255) gi0 = rem (UV.unsafeIndex p (ii + UV.unsafeIndex p (jj + UV.unsafeIndex p kk ))) 12 gi1 = rem (UV.unsafeIndex p (ii + i1 + UV.unsafeIndex p (jj + j1 + UV.unsafeIndex p (kk + k1)))) 12 gi2 = rem (UV.unsafeIndex p (ii + i2 + UV.unsafeIndex p (jj + j2 + UV.unsafeIndex p (kk + k2)))) 12 gi3 = rem (UV.unsafeIndex p (ii + 1 + UV.unsafeIndex p (jj + 1 + UV.unsafeIndex p (kk + 1 )))) 12 {-# INLINE n #-} n gi (x',y',z') = (\a -> if a < 0 then 0 else a*a*a*a*dot3 (UV.unsafeIndex grad3 gi) x' y' z') $ 0.6 - x'*x' - y'*y' - z'*z' harmonic :: Int -> (Double -> Double) -> Double harmonic octaves noise = f octaves / (2 - 1 / int (2 ^ (octaves - 1))) where f 0 = 0 fo = let r = 2 ^^ (o - 1) in noise r / r + f (o - 1) --3D simplex noise --syntax: simplex3D permutation number_of_octaves wavelength xyz simplex3D :: Permutation -> Int -> Double -> Double -> Double -> Double -> Double simplex3D p octaves lxyz = harmonic octaves (\f -> noise3D p (x * f / l) (y * f / l) (z * f / l)) 

Along with reducing the size of my piece to 8x8x128, the generation of new relief fragments now occurs at a speed of about 10-20 frames per second, which means that moving is now not as problematic as before. Of course, any other performance improvements are still welcome.

+35
performance polymorphism floating-point haskell procedural-generation
Apr 18 '11 at 18:32
source share
1 answer

Initially, this means that your code is very polymorphic. You should specialize your floating point type evenly on Double , so GHC (and LLVM) have the ability to apply more aggressive optimizations.

Note. For those trying to reproduce, this code imports:

 import qualified Data.Vector as V import Data.Bits import Data.Time.Clock import System.Random import System.Random.Shuffle type Permutation = V.Vector Int 



Ok There is much to be done to improve this code.

<strong> Improvements

Data presentation

  • Specializes in a specific floating point type, not polymorphic floating point functions
  • Replace tuple (a,a,a) with unboxed triple T !Double !Double !Double
  • Switch from Data.Array to Data.Array.Unboxed to Permutations
  • Replace using short type array with multidimensional unpacked array from repa package

Compiler flags

  • Compile with -O2 -fvia-C -optc-O3 -fexcess-precision -optc-march=native (or equivalent with -fllvm )
  • Increase threshold spec constr - -fspec-constr-count=16

More efficient library functions

  • Use mersenne-random instead of StdGen to generate random ones.
  • Replace mod with rem
  • Replace indexing V.! to unverified indexing of VU.unsafeIndex (after switching to Data.Vector.Unboxed

Runtime Settings

  • Increase the default distribution area: -A20M or -H



Also, check that your algorithm is identical to C # 1, and you are using the same data structures.

+27
Apr 18 '11 at 18:45
source share



All Articles