Java Double overflow

So basically, I'm trying to calculate the likelihood ratio of two things happening together. The equations are fairly straightforward, but the problem is that my data is quite large, and sometimes medium-sized operations.

I am currently using double for my variables, so flipping is not possible.
The equation also has a logarithm and exponential operators. However, I did not find any uncharacteristic mathematical functions for BigDecimal or similar types.

In addition, I have already tried to simplify the equations as much as possible.

I wonder what my options are. Here is my code:

c1 = unigramsInfo.get(w1)[0]; c2 = unigramsInfo.get(w2)[0]; c12 = entry.getValue()[0]; N = additionalInfo.get("tail")[1]; p = c2 / N; p1 = c12 / c1; p2 = (c2 - c12) / (N - c1); likelihood = - 2 * ( c2 * Math.log(p) + (N - c2) * Math.log(1 - p) - c12 * Math.log(p1) - (c1 - c12) * Math.log(1 - p1) - (c2 - c12) * Math.log(p2) - (N - c1 - c2 - c12) * Math.log(1 - p2) ); 

Here N can reach ten million, and the probabilities can be the same as 1.0E-7.

+5
source share
1 answer

I tried with you an expression (since I do not know the origin of c1 , c2 , c12 and N, I hard-coded their values). Thus, the hard-set values โ€‹โ€‹are as follows:

 double c1 = 0.1; double c2 = 0.2; double c12 = 0.3; double N = 0.4; 

And I have a probability = NaN .

As mentioned above, pay attention to the input. The first problematic expressions (you may get an overflow here due to the separation of extra or large numbers):

 double p = c2 / N; double p1 = c12 / c1; double p2 = (c2 - c12) / (N - c1); 

Then you calculate the logarithms. In fact, in my case (with the hard-coded values โ€‹โ€‹indicated above) I got NaN in the expression Math.log(1 - p1) (since it is trying to calculate the decimal logarithm of a negative number - p1 <1, when c1> c2 is a very likely case).

Generally speaking, you can get not only overflow (in extreme cases), but also NaN (even for "reasonable" input).

The suggestion is to split the long expression into small Java expressions. And check every value that could lead to NaN or overflow before calculating and throwing exceptions manually. This will help locate the cause of the problem when you receive invalid input.

+1
source

Source: https://habr.com/ru/post/1235576/


All Articles