IOUArray for ByteSring as fast as possible

I need to mutate elements in a fixed-size Word8 array very quickly. For this I use IOUArray . I need to send this array on a network connection. The sendBinaryData function from the websockets package requires a ByteString . I need to convert from one view to another. I am using this function currently:

 arrayToBS :: IOUArray Int Word8 -> IO (BS.ByteString) arrayToBS = (fmap BS.pack) . getElems 

This function converts the array elements to [Word8] before packing this list into a byte string, and from the profiling I see that it is rather slow. I was wondering if there is a way to speed up this function or maybe send an array through a connection to a web server directly?

The array that I am currently using:

 size = 1000; numBytes = size * size * 4 newBuffer :: IO (IOUArray Int Word8) newBuffer = newArray (0, numBytes) 200 :: IO (IOUArray Int Word8) 

and with the exception of the performance report:

 COST CENTRE MODULE SRC %time %alloc arrayToBS Lib src/Lib.hs:28:1-37 88.1 99.0 newBuffer Lib src/Lib.hs:(23,1)-(25,12) 9.9 0.8 

Ideally, arrayToBS will be much faster than creating an array. If I change size to 100:

 COST CENTRE MODULE SRC %time %alloc arrayToBS Lib src/Lib.hs:21:1-37 100.0 86.1 mkEncodeTable.table Data.ByteString.Base64.Internal Data/ByteString/Base64/Internal.hs:105:5-75 0.0 8.0 mkEncodeTable.ix Data.ByteString.Base64.Internal Data/ByteString/Base64/Internal.hs:104:5-43 0.0 1.1 
+5
source share
2 answers

Disclaimer I am not very familiar with these low-level primitives, so in some cases this can be dangerous.


You will at least need to copy the data at a time, since, as @ user2407038 observes, the underlying data stored in the IOUArray is an unsecured array, so we cannot rely on the GHC to not move the array around, however the reverse direction ( ByteString to IOArray ) is possible without a copy.

 {-# LANGUAGE UnboxedTuples, MagicHash #-} import Data.ByteString.Internal (ByteString(..)) import Data.Array.IO.Internals (IOUArray(..)) import Data.Array.Base (STUArray(..)) import Foreign.ForeignPtr (mallocForeignPtrBytes, withForeignPtr) import GHC.IO (IO(..)) import GHC.Exts (copyMutableByteArrayToAddr#, Ptr(..), Int(..)) arrayToBS :: IOUArray Int Word8 -> IO ByteString arrayToBS (IOUArray (STUArray _ _ n@ (I# n') mutByteArr)) = do bytes <- mallocForeignPtrBytes n withForeignPtr bytes $ \(Ptr addr) -> IO $ \state -> (# copyMutableByteArrayToAddr# mutByteArr 0# addr n' state, () #) pure (PS bytes 0 n) 

Here is a test of this work (remember that the ascii code for 'A' is 65 ):

 ghci> iou <- newListArray (-2,9) [65,67..] :: IO (IOUArray Int Word8) ghci> arrayToBS iou "ACEGIKMOQSUW" 
+2
source

Ok, thanks to user2407038 I have something (note that I have never played with primitives or unboxed types before):

 import Control.Monad.ST import qualified Data.ByteString as BS import Data.Word import Data.Array.ST import Data.Array.Base import Data.ByteString.Internal import GHC.Prim import GHC.Exts import GHC.ForeignPtr bs2Addr# :: BS.ByteString -> Addr# bs2Addr# (PS fptr offset len) = case fptr of (ForeignPtr addr _ ) -> addr arrayPrim (STUArray _ _ _ x) = x unbox :: Int -> Int# unbox (I# n#) = n# copy :: Int -> IO BS.ByteString copy len = do -- Get the length as unboxed let len# = unbox len -- Bytestring to copy to, filled with 0s initially let bs = BS.pack (replicate len 0) -- Create a new STUArray. I don't know why it needs to be length * 2. arr <- stToIO (newArray (0, len * 2) 255 :: ST s (STUArray s Int Word8)) -- MutableByteArray# let mArrPrim = arrayPrim arr -- Addr# let addr = bs2Addr# bs -- I don't know what the 2nd and 4th Int# arguments are suppose to be. let _ = copyMutableByteArrayToAddr# mArrPrim len# addr len# realWorld# return bs 

I am using STUArray here instead of IOUArray now because I could not find the IOUArray constructor.

Profiling results for this code with an array of 4,000,000 elements:

  Sun Aug 20 20:49 2017 Time and Allocation Profiling Report (Final) shoot-exe +RTS -N -p -RTS total time = 0.05 secs (47 ticks @ 1000 us, 1 processor) total alloc = 204,067,640 bytes (excludes profiling overheads) COST CENTRE MODULE SRC %time %alloc copy.bs Lib src/Lib.hs:32:7-36 66.0 96.0 copy Lib src/Lib.hs:(27,1)-(45,11) 34.0 3.9 

This is the code I compared it with:

 arrayToBS :: (STUArray s Int Word8) -> ST s (BS.ByteString) arrayToBS = (fmap BS.pack) . getElems slowCopy :: Int -> IO BS.ByteString slowCopy len = do arr <- stToIO (newArray (0, len - 1) 255 :: ST s (STUArray s Int Word8)) stToIO $ arrayToBS arr 

And his profiling report:

  Sun Aug 20 20:48 2017 Time and Allocation Profiling Report (Final) shoot-exe +RTS -N -p -RTS total time = 0.55 secs (548 ticks @ 1000 us, 1 processor) total alloc = 1,604,073,872 bytes (excludes profiling overheads) COST CENTRE MODULE SRC %time %alloc arrayToBS Lib src/Lib.hs:48:1-37 98.2 99.7 slowCopy Lib src/Lib.hs:(51,1)-(53,24) 1.6 0.2 

OK, the new version is faster. They both produce the same result. However, I would still like to know what the #Int parameters #Int for copyMutableByteArrayToAddr# and why should I multiply the length of the array in the fast version by 2. I will play a little more and update this answer if I find out.

Update: Alec answer

For the curious, this is the result of profiling Alec's answer:

  Sun Aug 20 21:13 2017 Time and Allocation Profiling Report (Final) shoot-exe +RTS -N -p -RTS total time = 0.01 secs (7 ticks @ 1000 us, 1 processor) total alloc = 8,067,696 bytes (excludes profiling overheads) COST CENTRE MODULE SRC %time %alloc newBuffer Other src/Other.hs:23:1-33 85.7 49.6 arrayToBS.\.\ Other src/Other.hs:19:5-69 14.3 0.0 arrayToBS Other src/Other.hs:(16,1)-(20,21) 0.0 49.6 

Looks like the one to use.

+1
source

Source: https://habr.com/ru/post/1271036/


All Articles