Convert a rational arbitrary precision number (OCaml, zarith) to an approximate floating point number

Question

Convert a rational arbitrary precision number (OCaml, zarith) to an approximate floating point number

I use the Zarith library to do rational arithmetic of arbitrary precision. Suppose I have a rational number q type Qt , that is, the ratio of two large integers ( q is the module of the rational number of numbers of arbitrary accuracy Zarith). Sometimes I would like to print this number as a floating point number for readability, and sometimes I need to convert this floating point number for subsequent calculations without arbitrary precision. Is there a way to convert q to floating point to a certain level of accuracy?

The method of converting q to a floating point now has no guarantees and can create undefined floating point numbers ( Z is an integer module of arbitrary precision):

 let to_float q = let n, d = num q, den q in (* check if d is zero and raise an error if it is *) let nf, df = Z.to_float n, Z.to_float d in nf /. df

Is there a better way to handle this, where can I get a floating point that most closely approximates any q ?

Edit

I quickly wrote Mark Dickinson's answer in OCaml if anyone is interested. It can (definitely) be improved and refined. I will edit if I do this, or if anyone has any improvement tips. But for now, this solved my problem!

 let to_float q = let n, d = num q, den q in let n_sign = Z.sign n in let d_sign = Z.sign d in (* always >= 0 *) if d_sign = 0 then raise Division_by_zero; let n = Z.abs n in if n_sign = 0 then 0. else let shift = (Z.numbits n) - (Z.numbits d) - 55 in let is_subnormal = shift < -1076 in let shift = if is_subnormal then -1076 else shift in let d = if shift >= 0 then Z.shift_left d shift else d in let n = if shift < 0 then Z.shift_left n (-shift) else n in let quotient, remainder = Z.div_rem nd in let quotient = if (Z.compare remainder (Z.zero)) = 0 && Z.is_even quotient then Z.add Z.one quotient else quotient in let quotient = if not is_subnormal then quotient else let round_select = Z.to_int @@ Z.rem quotient @@ Z.of_int 8 in Z.add quotient [|Z.zero;Z.minus_one;Z.of_int (-2);Z.one;Z.zero ;Z.minus_one;Z.of_int 2;Z.one|].(round_select) in let unsigned_res = ldexp (Z.to_float quotient) shift in if n_sign = 1 then unsigned_res else -.unsigned_res

Now I will consider creating an interface for the GMP mpq_get_d function, but I'm not quite sure how to do this. The only way I can see is to convert q : Qt to a string and pass to:

 int mpq_set_str (mpq_t rop, const char *str, int base)

Does anyone know how to pass rop to mpq_get_d in OCaml or get a link that describes how to do this? I looked through chapter 19 of the RWO and did not see such a situation.

+5

floating-point ocaml arbitrary-precision rational

Dimitrios Nov 10 '15 at 6:01

source share

2 answers

This is not a complete answer, but looking around, I discovered that Zarit uses GMP internally. There is a GMP function called mpq_get_d that converts the rational to double. If it is not available directly in Zarith, it should be possible (given some time) to add an interface for it.

+2

Jeffrey scofield Nov 10 '15 at 6:44

source share

Mark dickinson · Accepted Answer · 2015-11-10T20:54:33+0000

If you have access to

integer log2 operation and
the ability to left shift an integer by a given number of bits

then it is relatively easy to collapse your correct rounded transform. In a nutshell, the method is as follows:

Reduce to the case n > 0 , d > 0 ; filter out obvious underflow / overflow
Choose the integer shift , so 2^-shift*n/d is between 2^54 and 2^56 .
Use integer arithmetic to compute x = 2^-shift*n/d rounded to the nearest integer using the round-to-odd rounding method.
Convert x to the closest IEEE 754 value with double precision dx , with the usual rounding mode to even rounding.
Return ldexp(dx, shift) .

I'm afraid I don't own OCaml, but the following Python code illustrates the idea of positive inputs. I leave this for you to make obvious changes for negative inputs and for division by zero. You can also make an early return for cases of extreme overflow and underflow: they are easy to spot by looking for larger or smaller shift values below.

 from math import ldexp def to_float(numerator, denominator): """ Convert numerator / denominator to float, correctly rounded. For simplicity, assume both inputs are positive. """ # Shift satisfies 2**54 < (numerator / denominator) / 2**shift < 2**56 shift = numerator.bit_length() - denominator.bit_length() - 55 # Divide the fraction by 2**shift. if shift >= 0: denominator <<= shift else: numerator <<= -shift # Convert to the nearest integer, using round-to-odd. q, r = divmod(numerator, denominator) if r != 0 and q % 2 == 0: q += 1 # Now convert to the nearest float and shift back. return ldexp(float(q), shift)

Some notes:

The bit_length method for a positive integer n gives the number of bits needed to represent n , or, in other words, 1 + floor(log2(n)) .
divmod is a Python function that simultaneously calculates the factor and remainder of integer division.
q value fits (easily) into a 64-bit integer
we round two times: once when converting the shifted numerator / denominator to the nearest integer and again when rounding this integer to a float. In the first round, the round-to-odd method is used; this ensures that the second round (implicit in the conversion from int to float) gives the same result as if we rounded the fraction directly in the float.
the above algorithm does not correctly handle fractions whose converted float value is subnormal: in this case, the ldexp operation represents a potentially third rounding. This can be taken care of with some caution. See below code.

The above is actually a simplified version of the algorithm that Python uses when dividing one (large) integer by another to get a floating point result. Here you can see the source here . The comment at the beginning of the long_true_divide function gives an overview of the method.

For completeness, here is an option that also applies correctly to subnormal results.

 def to_float(numerator, denominator): """ Convert numerator / denominator to float, correctly rounded. For simplicity, assume both inputs are positive. """ # Choose shift so that 2**54 < numerator / denominator / 2**shift < 2**56 shift = numerator.bit_length() - denominator.bit_length() - 55 # The 'treat_as_subnormal' flag catches all cases of subnormal results, # along with some cases where the result is not subnormal but *is* still # smaller than 2**-1021. In all these cases, it sufficient to find the # closest integer multiple of 2**-1074. We first round to the nearest # multiple of 2**-1076 using round-to-odd. treat_as_subnormal = shift < -1076 if treat_as_subnormal: shift = -1076 # Divide the fraction by 2**shift. if shift >= 0: denominator <<= shift else: numerator <<= -shift # Convert to the nearest integer, using round-to-odd. q, r = divmod(numerator, denominator) if r != 0 and q % 2 == 0: q += 1 # Now convert to the nearest float and shift back. if treat_as_subnormal: # Round to the nearest multiple of 4, rounding ties to # the nearest multiple of 8. This avoids double rounding # from the ldexp call below. q += [0, -1, -2, 1, 0, -1, 2, 1][q%8] return ldexp(float(q), shift)

Convert a rational arbitrary precision number (OCaml, zarith) to an approximate floating point number

More articles: