Why is this Haskell program so much slower than the equivalent Python?

Question

Why is this Haskell program so much slower than the equivalent Python?

As part of the programming task, I need to read from stdin a sequence of integers separated by spaces (on one line) and print the sum of these integers in stdout. This sequence can contain as many as 10,000,000 integers.

I have two solutions for this: one is written in Haskell ( foo.hs ), and the other is equivalent, written in Python 2 ( foo.py ). Unfortunately, the (compiled) Haskell program is consistently slower than the Python program, and I find it difficult to explain the performance mismatch between the two programs; See the Benchmark section below. Anyway, I expected Haskell to have the strength ...

What am I doing wrong? How can I explain this discrepancy? Is there an easy way to speed up my Haskell code?

(For information, I'm using a mid-2010 MacBook Pro with 8GB RAM, GHC 7.8.4, and Python 2.7.9.)

`foo.hs`

 main = print . sum =<< getIntList getIntList :: IO [Int] getIntList = fmap (map read . words) getLine

(compiled with ghc -O2 foo.hs )

`foo.py`

 ns = map(int, raw_input().split()) print sum(ns)

Benchmark

Further test.txt consists of one line of 10 million integers, separated by spaces.

 # Haskell $ time ./foo < test.txt 1679257 real 0m36.704s user 0m35.932s sys 0m0.632s # Python $ time python foo.py < test.txt 1679257 real 0m7.916s user 0m7.756s sys 0m0.151s

+46

performance python io haskell

Jubobs Mar 21 '15 at 18:38

source share

3 answers

Read slowly

A quick read from this answer will lead you to 5.5 seconds.

 import Numeric fastRead :: String -> Int fastRead s = case readDec s of [(n, "")] -> n

Lines are linked lists.

In Haskell, the String type is a linked list. Using a packaged view ( bytestring if you really want only ascii, but Text also very fast and supports unicode). As shown in this answer , performance should be neck and neck.

+27

Thomas M. DuBuisson Mar 21 '15 at 19:08

source share

I would venture to guess that most of your problem is actually words . When you map read . words map read . words , what you are actually doing is:

Scan the input that searches for space, creating a list of non-spaces during your work. There are many different types of spaces, and checking for any character that is not a common type of space additionally includes an external C function call (slow). I plan to fix this someday, but I haven’t reached it yet, and even then you will still build and write down lists for no good reason and check the spaces when you really want to check the numbers.
Read the list of accumulated characters to try to make a number out of them. Generate a number. The accumulated list now becomes garbage.
Return to step 1.

This is a pretty funny way to continue. I believe that you can even better use something awful, like reads , but it would be wiser to use something like ReadP . You can also try more convenient kinds of things, such as flow-based analysis; I do not know if this will help a lot or not.

+4

dfeuer Mar 21 '15 at 20:47

source share

András Kovács · Accepted Answer · 2015-03-21 19:12

read slow. For bulk parsing, use bytestring or text or attoparsec .

I compared a little. Your original version runs for 23.9 seconds on my computer. The version below ran in 0.35 secs:

 import qualified Data.ByteString.Char8 as B import Control.Applicative import Data.Maybe import Data.List import Data.Char main = print . sum =<< getIntList getIntList :: IO [Int] getIntList = map (fst . fromJust . B.readInt) . B.words <$> B.readFile "test.txt"

test.txt specializing the analyzer in your test.txt file, I could reduce the runtime to 0.26 seconds:

 getIntList :: IO [Int] getIntList = unfoldr (B.readInt . B.dropWhile (==' ')) <$> B.readFile "test.txt"

Why is this Haskell program so much slower than the equivalent Python?

foo.hs

foo.py

Benchmark

More articles:

`foo.hs`

`foo.py`