Why is this Haskell program so much slower than the equivalent Python?

As part of the programming task, I need to read from stdin a sequence of integers separated by spaces (on one line) and print the sum of these integers in stdout. This sequence can contain as many as 10,000,000 integers.

I have two solutions for this: one is written in Haskell ( foo.hs ), and the other is equivalent, written in Python 2 ( foo.py ). Unfortunately, the (compiled) Haskell program is consistently slower than the Python program, and I find it difficult to explain the performance mismatch between the two programs; See the Benchmark section below. Anyway, I expected Haskell to have the strength ...

What am I doing wrong? How can I explain this discrepancy? Is there an easy way to speed up my Haskell code?

(For information, I'm using a mid-2010 MacBook Pro with 8GB RAM, GHC 7.8.4, and Python 2.7.9.)

foo.hs

 main = print . sum =<< getIntList getIntList :: IO [Int] getIntList = fmap (map read . words) getLine 

(compiled with ghc -O2 foo.hs )

foo.py

 ns = map(int, raw_input().split()) print sum(ns) 

Benchmark

Further test.txt consists of one line of 10 million integers, separated by spaces.

 # Haskell $ time ./foo < test.txt 1679257 real 0m36.704s user 0m35.932s sys 0m0.632s # Python $ time python foo.py < test.txt 1679257 real 0m7.916s user 0m7.756s sys 0m0.151s 
+46
performance python io haskell
Mar 21 '15 at 18:38
source share
3 answers

read slow. For bulk parsing, use bytestring or text or attoparsec .

I compared a little. Your original version runs for 23.9 seconds on my computer. The version below ran in 0.35 secs:

 import qualified Data.ByteString.Char8 as B import Control.Applicative import Data.Maybe import Data.List import Data.Char main = print . sum =<< getIntList getIntList :: IO [Int] getIntList = map (fst . fromJust . B.readInt) . B.words <$> B.readFile "test.txt" 

test.txt specializing the analyzer in your test.txt file, I could reduce the runtime to 0.26 seconds:

 getIntList :: IO [Int] getIntList = unfoldr (B.readInt . B.dropWhile (==' ')) <$> B.readFile "test.txt" 
+64
Mar 21 '15 at 19:12
source share

Read slowly

A quick read from this answer will lead you to 5.5 seconds.

 import Numeric fastRead :: String -> Int fastRead s = case readDec s of [(n, "")] -> n 

Lines are linked lists.

In Haskell, the String type is a linked list. Using a packaged view ( bytestring if you really want only ascii, but Text also very fast and supports unicode). As shown in this answer , performance should be neck and neck.

+27
Mar 21 '15 at 19:08
source share

I would venture to guess that most of your problem is actually words . When you map read . words map read . words , what you are actually doing is:

  • Scan the input that searches for space, creating a list of non-spaces during your work. There are many different types of spaces, and checking for any character that is not a common type of space additionally includes an external C function call (slow). I plan to fix this someday, but I haven’t reached it yet, and even then you will still build and write down lists for no good reason and check the spaces when you really want to check the numbers.
  • Read the list of accumulated characters to try to make a number out of them. Generate a number. The accumulated list now becomes garbage.
  • Return to step 1.

This is a pretty funny way to continue. I believe that you can even better use something awful, like reads , but it would be wiser to use something like ReadP . You can also try more convenient kinds of things, such as flow-based analysis; I do not know if this will help a lot or not.

+4
Mar 21 '15 at 20:47
source share



All Articles