The next higher / lower double precision number is IEEE

I do high-precision scientific calculations. In search of a better view of the various effects, I continue to come up with reasons to get the next higher (or lower) double-precision number. Essentially, I want to add one of the least significant bits to the internal representation of the double.

The difficulty is that the IEEE format is not completely uniform. If someone used a low-level code and actually added one of the least significant bits, the resulting format might not be the next available double. For example, this could be a special case number, such as PositiveInfinity or NaN. There are also sub normal values, which I do not claim to understand, but which seem to have certain bit patterns other than the "normal" pattern.

The meaning of epsilon is available, but I never understood its definition. Since double values ​​are not evenly distributed, no value can be added to double to lead to the next higher value.

I really don't understand why IEEE did not specify a function to get the next higher or lower value. I cannot be the only one who needs it.

Is there a way to get the next value (without any loop that tries to add smaller and smaller values).

+18
double ieee-754 floating-point-precision
Aug 07 '09 at 16:59
source share
6 answers

There are functions available to do just that, but they may depend on which language you use. Two examples:

  • if you have access to the C99 math library, you can use nextafter (and its floating and long double options, nextafterf and nextafterl ); or the nexttoward family (which take a long double character as the second argument).

  • if you write Fortran, you have the nearest built-in

If you cannot access them directly from your language, you can also see how they are implemented in freely available ones, for example this one .

+12
Aug 09 '09 at 17:17
source share
— -

As Torsten S. says, this can be done with the BitConverter class, but its method assumes that the DoubleToInt64Bits method returns the internal structure of double bytes, which is not there. The integer returned by this method actually returns the number of represented doubles between 0 and yours. That is, the smallest positive double is represented by 1, the next largest double is 2, etc. etc. Negative numbers start with long.MinValue and go from 0d.

So you can do something like this:

 public static double NextDouble(double value) { // Get the long representation of value: var longRep = BitConverter.DoubleToInt64Bits(value); long nextLong; if (longRep >= 0) // number is positive, so increment to go "up" nextLong = longRep + 1L; else if (longRep == long.MinValue) // number is -0 nextLong = 1L; else // number is negative, so decrement to go "up" nextLong = longRep - 1L; return BitConverter.Int64BitsToDouble(nextLong); } 

This does not apply to Infinity and NaN, but you can check them out and deal with them as you like if you are worried about it.

+5
Feb 17 '10 at 19:02
source share

Most languages ​​have built-in or library functions to get the next or previous single-point (32-bit) and / or two-point (64-bit) numbers.

For users of 32-bit and 64-bit floating-point arithmetic, a reasonable understanding of the basic constructions is very useful to prevent some dangers with them. The IEEE standard is applied uniformly, but still leaves a number of details to the performers. Therefore, a universal platform solution based on bit manipulations of machine word representations can be problematic and may depend on issues such as endian, etc. Although understanding all the details about how it can or should work at the bit level can demonstrate intellectual skill, it's better to use an internal or library solution designed for each platform and having a universal API on supported platforms.

I noticed solutions for C # and C ++. Here are some of them for Java:

Math.nextUp:

public static double nextUp (double d):

  • Returns a floating point value adjacent to d in the direction of positive infinity. This method is semantically equivalent to nextAfter (d, Double.POSITIVE_INFINITY); however, the nextUp implementation may be faster than the equivalent of the nextAfter call.

Special Occasions:

  • If the argument is NaN, the result is NaN.
  • If the argument is positive infinity, the result is positive infinity.
  • If the argument is zero, the result will be Double.MIN_VALUE

Options:

  • d - start of floating point value

Return:

  • The closest floating point value is closer to positive infinity.

public static float nextUp (float f):

  • Returns a floating point value adjacent to f in the direction of positive infinity. This method is semantically equivalent to nextAfter (f, Float.POSITIVE_INFINITY); however, the nextUp implementation may be faster than the equivalent of the nextAfter call.

Special Occasions:

  • If the argument is NaN, the result is NaN.
  • If the argument is positive infinity, the result is positive infinity.
  • If the argument is zero, the result will be Float.MIN_VALUE

Options:

  • f - start floating point value

Return:

  • The closest floating point value is closer to positive infinity.

The following two are more difficult to use. However, the direction to zero or to positive or negative infinity seems more likely and useful. Another use is to see an intermediate value between two values. You can determine how many exist between two values ​​using a loop and a counter. Also, it looks like they, along with the following methods, may be useful for incrementing / decrementing for loops.

Math.nextAfter:

public static double nextAfter (double start, double direction)

  • Returns a floating point number adjacent to the first argument in the direction of the second argument. If both arguments are compared as equal to the second argument.

Special Occasions:

  • If any argument is NaN, NaN is returned.
  • If both arguments coincide with zeros, the direction is returned unchanged (which is implied by the requirement to return the second argument if the arguments are compared as equal).
  • If the beginning is ± Double.MIN_VALUE and the direction is such that the result should be smaller, then zero with the same sign when the beginning is returned.
  • If the beginning is infinite, and the direction is so important that the result should be smaller, Double.MAX_VALUE with the same sign as the beginning is returned.
  • If the start is ± Double.MAX_VALUE, and the direction matters so that the result should be large, infinity with the same sign as start.

Options:

  • start - start a floating point value
  • direction - a value indicating which of the starting neighbors or start should be returned

Return:

  • A floating point number adjacent to the direction beginning in the direction.

public static float nextAfter (floating point launch, double direction)

  • Returns a floating point number adjacent to the first argument in the direction of the second argument. If both arguments are compared as equal to the value equivalent to the second argument.

Special Occasions:

  • If any argument is NaN, then NaN is returned.
  • If both arguments mean zeros, a value equivalent to the direction is returned.
  • If start is ± Float.MIN_VALUE and the direction is such that the result should be smaller, then zero with the same sign when the beginning returns.
  • If the beginning is infinite, and the direction is such that the result should be smaller, Float.MAX_VALUE with the same sign as the beginning is returned.
  • If the beginning is ± Float.MAX_VALUE, and the direction is such that the result should be large, infinity with the same sign when the beginning returns.

Options:

  • start - start a floating point value
  • direction - a value indicating which of the start neighbors or start should be returned

Return:

  • A floating point number adjacent to the beginning in the direction of the direction.
+5
Jul 10 '12 at 21:10
source share

Yes, there is a way. In C #:

  public static double getInc (double d) { // Check for special values if (double.IsPositiveInfinity(d) || double.IsNegativeInfinity(d)) return d; if (double.IsNaN(d)) return d; // Translate the double into binary representation ulong bits = (ulong)BitConverter.DoubleToInt64Bits(d); // Mask out the mantissa bits bits &= 0xfff0000000000000L; // Reduce exponent by 52 bits, so subtract 52 from the mantissa. // First check if number is great enough. ulong testWithoutSign = bits & 0x7ff0000000000000L; if (testWithoutSign > 0x0350000000000000L) bits -= 0x0350000000000000L; else bits = 0x0000000000000001L; return BitConverter.Int64BitsToDouble((long)bits); } 

The increase can be added and subtracted.

+2
Dec 08 '09 at 0:04
source share

I am not sure that I am following your problem. Of course, the IEEE standard completely homogeneous? For example, look at this excerpt from the wikipedia article for double-precision numbers.

 3ff0 0000 0000 0000 = 1 3ff0 0000 0000 0001 = 1.0000000000000002, the next higher number > 1 3ff0 0000 0000 0002 = 1.0000000000000004 

What is wrong if you just increment the least significant bit in binary or hexadecimal representation?

As for special numbers (infinity, NaN, etc.), they are well defined and there are very few of them. Limits are also defined.

Since you obviously learned this, I expect that I have the wrong end of the stick. If this is not enough for your problem, could you try to figure out what you want to achieve? What is your goal here?

+1
Aug 07 '09 at 17:37
source share

In terms of epsilon function, this is an estimate of how far the binary double is from the decimal value. This is because with very large positive or negative decimal numbers or very small positive or negative decimal numbers, many of them are matched with the same binary representation as double. Try some very, very large or very, very small decimal numbers, create doubles from them, and then convert them back to a decimal number. You will find that you do not get the same decimal number, but the one in which the double is closest.

For values ​​near (close to a wide range of decimal values ​​that can be doubled) 1 or -1, epsilon will be zero or very, very small. For values ​​that are gradually advancing to + or - infinity or zero, epsilon will start to grow. With values ​​very close to zero or infinity, epsilon will be very large, since the available binary representations for decimal values ​​in these ranges are very, very sparse.

+1
Jul 11 '12 at 9:00
source share



All Articles