Problem
I wrote a Haskell program that goes through a folder and finds the average color of each image in the folder. It uses the hack repa-devil package to load images into repa arrays. I find the average by adding all the red, blue and green values, and then dividing by the number of pixels:
-- compiled with -O2 import qualified Data.Array.Repa as R import Data.Array.Repa.IO.DevIL import Control.Monad.Trans (liftIO) import System.Directory (getDirectoryContents) size :: (R.Source re) => R.Array r R.DIM3 e -> (Int, Int) size img = (w, h) where (RZ R.:. h R.:. w R.:. 3) = R.extent img averageColour :: (R.Source re, Num e, Integral e) => R.Array r R.DIM3 e -> (Int, Int, Int) averageColour img = (r `div` n, g `div` n, b `div` n) where (w, h) = size img n = w * h (r,g,b) = f 0 0 0 0 0 f row col rgb | row >= w = f 0 (col + 1) rgb | col >= h = (r, g, b) | otherwise = f (row + 1) col (addCol 0 r) (addCol 1 g) (addCol 2 b) where addCol xv = v + fromIntegral (img R.! (RZ R.:. col R.:. row R.:. x)) main :: IO () main = do files <- fmap (map ("images/olympics_backup/" ++) . filter (`notElem` ["..", "."])) $ getDirectoryContents "images/olympics_backup" runIL $ do images <- mapM readImage files let average = zip (map (\(RGB img) -> averageColour img) images) files liftIO . print $ average
I also wrote this program in Python using the Python image library. He finds the average value of the images in the same way:
import Image def get_images(folder): images = [] for filename in os.listdir(folder): images.append(folder + filename) return images def get_average(filename): image = Image.open(filename) pixels = image.load() r = g = b = 0 for x in xrange(0, image.size[0]): for y in xrange(0, image.size[1]): colour = pixels[x, y] r += colour[0] g += colour[1] b += colour[2] area = image.size[0] * image.size[1] r /= area g /= area b /= area return [(r, g, b), filename, image] def get_colours(images): colours = [] for image in images: try: colours.append(get_average(image)) except: continue return colours imgs = get_images('images/olympics_backup/') print get_colours(imgs)
When both of them are launched in a folder with 301 images, the Haskell version exceeds 0.2 seconds (0.87 against 0.64). This seems odd because Haskell is a compiled language (which is often faster than interpreted), and I heard that repo arrays had good performance (although this may have been compared to other Haskell data types like list) .
What i tried
The first thing I did was to notice that I was using explicit recursion, and therefore decided to replace it using a fold, which would also mean that I no longer needed to check if I was outside the array:
(r,g,b) = foldl' f (0,0,0) [(x, y) | x <- [0..w-1], y <- [0..h-1]] f (r,g,b) (row,col) = (addCol 0 r, addCol 1 g, addCol 2 b) where addCol xv = v + fromIntegral (img R.! (RZ R.:. col R.:. row R.:. x))
This made it run slower (1.2 seconds), so I decided to profile the code and see where most of the time wasted (in case I created an obvious bottleneck or the repa-devil package was just slow). The profile told me that ~ 58% of the time was spent on the function f, and ~ 35% of the time was spent in addCol.
Unfortunately, I canβt figure out how to make this run faster. The function is just an array index and the addition is the same as python code. Is there a way to improve the performance of this code, or is the Python image library just providing more performance?