Efficient conversion of 64-bit Double to ByteString

Question

Efficient conversion of 64-bit Double to ByteString

I wrote a function to convert a 64-bit Double to ByteString (architecture / type security is not really a problem), suppose Double is a 64-bit Word). Although the function below works well, I am wondering if there is a faster way to convert Double to ByteString. In the code below, there is one unpacking of Word64 into a Word8 list, followed by the opposite (to make it a small final format), and then packaging in ByteString. Code below:

{-# LANGUAGE MagicHash #-} import GHC.Prim import GHC.Types import GHC.Word import Data.Bits (shiftR) import Data.ByteString (pack, unpack) import Data.ByteString.Internal (ByteString) import Text.Printf (printf) encodeDouble :: Double -> ByteString encodeDouble (D# x) = pack $ reverse $ unpack64 $ W64# (unsafeCoerce# x) unpack64 :: Word64 -> [Word8] unpack64 x = map (fromIntegral.(shiftR x)) [56,48..0] -- function to convert list of bytestring into hex digits - for debugging bprint :: ByteString -> String bprint x = ("0x" ++ ) $ foldl (++) "" $ fmap (printf "%02x") $ unpack x main = putStrLn $ bprint $ encodeDouble 7234.4

Sample GHCi output on Mac x86:

 *Main> bprint $ encodeDouble 7234.4 "0x666666666642bc40"

While the code works well, I plan to use it to encode a Double values batch in a ByteString before sending it through IPC. So, I will be grateful that you will do it faster, if any.

It seems to me that the double should be unpacked in Word8, and then packaged in ByteString. Thus, there may be a general algorithm, since it cannot be significantly improved. But using the more efficient unpack / pack function would probably make a difference if it were.

EDIT1: I discovered another complication on Mac (GHC 7.0.3) - the code above will not compile in GHC due to this error - I have tested in GHCi so far:

 $ ghc -O --make t.hs [1 of 1] Compiling Main ( t.hs, to ) /var/folders/_q/33htc59519b3xq7y6xv100z40000gp/T/ghc6976_0/ghc6976_0.s:285:0: suffix or operands invalid for `movsd' /var/folders/_q/33htc59519b3xq7y6xv100z40000gp/T/ghc6976_0/ghc6976_0.s:304:0: suffix or operands invalid for `movsd'

So it looks like I should go back to FFI (cereal package / data -binary-ieee754) until this error is fixed, or until I find a workaround. Looks like a GHC Ticket 4092 link. Please correct me if this is a new error or other error. So far I can not compile it :(

EDIT2: Updating the code to use unsafeCoerce fixes the compilation problem. Code below with criterion criteria:

 {-# LANGUAGE MagicHash #-} import GHC.Prim import GHC.Types import GHC.Word import Data.Bits (shiftR) import Data.ByteString (pack, unpack) import Data.ByteString.Internal (ByteString) import Text.Printf (printf) import Unsafe.Coerce import Criterion.Main --encodeDouble :: Double -> ByteString encodeDouble x = pack $ reverse $ unpack64 $ unsafeCoerce x unpack64 :: Word64 -> [Word8] unpack64 x = map (fromIntegral.(shiftR x)) [56,48..0] main = defaultMain [ bgroup "encodeDouble" [ bench "78901.234" $ whnf encodeDouble 78901.234 , bench "789.01" $ whnf encodeDouble 789.01 ] ]

Criterion output (truncated):

 estimating cost of a clock call... mean is 46.09080 ns (36 iterations) benchmarking encodeDouble/78901.234 mean: 218.8732 ns, lb 218.4946 ns, ub 219.3389 ns, ci 0.950 std dev: 2.134809 ns, lb 1.757455 ns, ub 2.568828 ns, ci 0.950 benchmarking encodeDouble/789.01 mean: 219.5382 ns, lb 219.0744 ns, ub 220.1296 ns, ci 0.950 std dev: 2.675674 ns, lb 2.197591 ns, ub 3.451464 ns, ci 0.950

Upon further analysis, most of the bottleneck seems to be in unpack64. Forcing takes ~ 6 ns. unpack64 takes ~ 195 ns. Unpacking the word64 as a list of words8 is pretty expensive here.

+6

casting haskell bytestring

Sal Dec 02 '11 at 1:50

source share

3 answers

I recently added IEEE-754 float support to cereal , and you can find similar functions for binary in data-binary-ieee754 . Here is an example that uses the cereal version to round pi to ByteString and vice versa:

 Prelude Data.Serialize> runGet getFloat64be $ runPut $ putFloat64be pi Right 3.141592653589793

It uses the ST array trick for quick conversion; see this earlier question for more details.

Update: D'oh, I need to know how to use the calls I made to the library ...

Update x2: Regarding compilation failure, I don't think this qualifies as an error.

I did not look too closely at the generated assembly for this particular code, but the operands to the movsd are lost. From section 11.4.1.1 of the Intel x86 manual :

MOVSD (scalar double precision floating-point moving) transfers a 64-bit double-precision floating-point operand from memory to the low quad-quad register XMM or vice versa or between XMM registers.

In non-optimized code, you have great instructions like movsd LnTH(%rip),%xmm0 , but in -O code you see things like movsd Ln2cJ(%rip),%rax , where %rax is the general register, not the XMM register.

The optimizer probably makes assumptions about the representations of the data that it needs to move between registers, depending on the type of data used. unsafeCoerce and friends are not valid for these assumptions, so when the command selector considers that it selects the correct operation for D# , it actually emits code that tries to populate this D# , where W64# will happily match.

Since the optimizer will need to give up many assumptions that allow it to fix the best code under normal circumstances, I am inclined to say that this is not a mistake, but a good story about why unsafe functions carry an unsafe Prevention warning.

+4

acfoltzer Dec 02 '11 at 3:32

source share

Following the suggestion of acfoltzer (cereal source code) and Daniel Fischer (unsafeCreate), I wrote the code below, which works well for my use and also fast:

 {-#LANGUAGE MagicHash #-} import Data.ByteString (pack, unpack) import Data.ByteString.Internal (unsafeCreate,ByteString) import Data.Bits (shiftR) import GHC.Int (Int64) import GHC.Prim import GHC.Types import GHC.Word import Unsafe.Coerce import Criterion.Main import Foreign -- | Write a Word64 in little endian format putWord64le :: Word64 -> Ptr Word8 -> IO() putWord64le wp = do poke p (fromIntegral (w) :: Word8) poke (p `plusPtr` 1) (fromIntegral (shiftR w 8) :: Word8) poke (p `plusPtr` 2) (fromIntegral (shiftR w 16) :: Word8) poke (p `plusPtr` 3) (fromIntegral (shiftR w 24) :: Word8) poke (p `plusPtr` 4) (fromIntegral (shiftR w 32) :: Word8) poke (p `plusPtr` 5) (fromIntegral (shiftR w 40) :: Word8) poke (p `plusPtr` 6) (fromIntegral (shiftR w 48) :: Word8) poke (p `plusPtr` 7) (fromIntegral (shiftR w 56) :: Word8) {-# INLINE putWord64le #-} encodeDouble :: Double -> ByteString encodeDouble x = unsafeCreate 8 (putWord64le $ unsafeCoerce x) main :: IO () main = defaultMain [ bgroup "encodeDouble" [ bench "78901.234" $ whnf encodeDouble 78901.234 , bench "789.01" $ whnf encodeDouble 789.01 ] ]

Critical conclusion (truncated):

 estimating cost of a clock call... mean is 46.80361 ns (35 iterations) found 5 outliers among 35 samples (14.3%) 3 (8.6%) high mild 2 (5.7%) high severe benchmarking encodeDouble/78901.234 mean: 18.80689 ns, lb 18.73805 ns, ub 18.97247 ns, ci 0.950 std dev: 516.7499 ps, lb 244.8588 ps, ub 1.043685 ns, ci 0.950 benchmarking encodeDouble/789.01 mean: 18.96963 ns, lb 18.90986 ns, ub 19.06127 ns, ci 0.950 std dev: 374.2191 ps, lb 275.3313 ps, ub 614.4281 ps, ci 0.950

From ~ 220ns to ~ 19ns, good! I did not do anything unusual in compiling. The "O" flag will be set on the GHC7 (Mac, x86_64).

Now, trying to figure out how to do this quickly with a list of doublings!

+1

Sal Dec 03 '11 at 19:28

source share

Daniel Fischer · Accepted Answer · 2011-12-02T02:08:13+0000

Please note that using unsafeCoerce# is dangerous here, docs say

Embedding an unboxed type into another unboxed type of the same size ( but not coercion between floating point types and integral )

As for speed, it might be faster to avoid an intermediate list and write directly to memory via unsafeCreate from Data.ByteString.Internal .

Efficient conversion of 64-bit Double to ByteString

More articles: