Ok, thanks to user2407038 I have something (note that I have never played with primitives or unboxed types before):
import Control.Monad.ST import qualified Data.ByteString as BS import Data.Word import Data.Array.ST import Data.Array.Base import Data.ByteString.Internal import GHC.Prim import GHC.Exts import GHC.ForeignPtr bs2Addr
I am using STUArray here instead of IOUArray now because I could not find the IOUArray constructor.
Profiling results for this code with an array of 4,000,000 elements:
Sun Aug 20 20:49 2017 Time and Allocation Profiling Report (Final) shoot-exe +RTS -N -p -RTS total time = 0.05 secs (47 ticks @ 1000 us, 1 processor) total alloc = 204,067,640 bytes (excludes profiling overheads) COST CENTRE MODULE SRC %time %alloc copy.bs Lib src/Lib.hs:32:7-36 66.0 96.0 copy Lib src/Lib.hs:(27,1)-(45,11) 34.0 3.9
This is the code I compared it with:
arrayToBS :: (STUArray s Int Word8) -> ST s (BS.ByteString) arrayToBS = (fmap BS.pack) . getElems slowCopy :: Int -> IO BS.ByteString slowCopy len = do arr <- stToIO (newArray (0, len - 1) 255 :: ST s (STUArray s Int Word8)) stToIO $ arrayToBS arr
And his profiling report:
Sun Aug 20 20:48 2017 Time and Allocation Profiling Report (Final) shoot-exe +RTS -N -p -RTS total time = 0.55 secs (548 ticks @ 1000 us, 1 processor) total alloc = 1,604,073,872 bytes (excludes profiling overheads) COST CENTRE MODULE SRC %time %alloc arrayToBS Lib src/Lib.hs:48:1-37 98.2 99.7 slowCopy Lib src/Lib.hs:(51,1)-(53,24) 1.6 0.2
OK, the new version is faster. They both produce the same result. However, I would still like to know what the #Int parameters #Int for copyMutableByteArrayToAddr# and why should I multiply the length of the array in the fast version by 2. I will play a little more and update this answer if I find out.
Update: Alec answer
For the curious, this is the result of profiling Alec's answer:
Sun Aug 20 21:13 2017 Time and Allocation Profiling Report (Final) shoot-exe +RTS -N -p -RTS total time = 0.01 secs (7 ticks @ 1000 us, 1 processor) total alloc = 8,067,696 bytes (excludes profiling overheads) COST CENTRE MODULE SRC %time %alloc newBuffer Other src/Other.hs:23:1-33 85.7 49.6 arrayToBS.\.\ Other src/Other.hs:19:5-69 14.3 0.0 arrayToBS Other src/Other.hs:(16,1)-(20,21) 0.0 49.6
Looks like the one to use.