Resolving zeros in product items in a list

Given that we can easily convert the product of elements into a list with the sum of the logarithm of the elements in the list if the list does not have 0, for example:

>>> from operator import mul >>> pn = [0.4, 0.3, 0.2, 0.1] >>> math.pow(reduce(mul, pn, 1), 1./len(pn)) 0.22133638394006433 >>> math.exp(sum(0.25 * math.log(p) for p in pn)) 0.22133638394006436 

How should we handle cases when there are 0s in the list and in Python (programmatically and mathematically correct)?

In particular, how should we handle cases such as:

 >>> pn = [0.4, 0.3, 0, 0] >>> math.pow(reduce(mul, pn, 1), 1./len(pn)) 0.0 >>> math.exp(sum(1./len(pn) * math.log(p) for p in pn)) Traceback (most recent call last): File "<stdin>", line 1, in <module> File "<stdin>", line 1, in <genexpr> ValueError: math domain error 

Does returning 0 really the right way to handle this? What is an elegant solution that we consider 0s in the list, but not end up with 0s?

Since this is a kind of geometric mean (product of the list), and it is not entirely useful when we return 0 only because the list has one 0.

Pour from Math Stackexchange: https://math.stackexchange.com/questions/1727497/resolving-zeros-in-product-of-items-in-list , no math answer, maybe python / code Jedis have better ideas for solving this problem.

+5
source share
4 answers

TL DR : Yes, returning 0 is the only right way. (But see Conclusion.)

Math background

In real analysis (i.e., not for complex numbers), when the logarithms are considered, we traditionally believe that the domain log are real positive numbers. We have the identity:

 x = exp(log(x)), for x>0. 

It can be naturally extended to x=0 , since the limit of the expression on the right side is well defined at x->0+ and is 0. Moreover, it has the right to set log(0)=-inf and exp(-inf)=0 (again: only for real, not complicated numbers). Formally, we expand the set of real numbers by adding two elements -inf , +inf and defining consistent arithmetic, etc. (For our purposes, we need to have inf + x = inf , x * inf = inf for real x, inf + inf = inf , etc. ..)

Another identity x = log(exp(x)) less problematic and holds for all real numbers (and even x=-inf or +inf ).

Geometric mean

The geometric mean can be determined for non-negative numbers (possibly equal to zeros). For two numbers a , b (it naturally generalizes to a larger number of numbers, so I will use only two others), this

 gm(a,b) = sqrt(a*b), for a,b >= 0. 

Of course, gm(0,b)=0 . Taking log, we get:

 log(gm(a,b)) = (log(a) + log(b))/2 

and this is well defined if a or b is zero. (We can connect log(0) = -inf , and the identity still holds true thanks to the extended arithmetic we defined earlier.)

Interpretation

It is not surprising that the concept of geometric mean comes from geometry and was originally (in ancient Greece) used for strictly positive numbers.

Suppose we have a rectangle with sides of lengths a and b . Find a square with an area equal to the area of ​​the rectangle. It is easy to see that the side of the square is the geometric mean of a and b .

Now, if we take a = 0 , then we really do not have a rectangle, and this geometric interpretation breaks. Similar problems may arise with other interpretations. We can mitigate this by considering, for example, degenerate rectangles and squares, but this may not always be a plausible approach.

Conclusion

This is for the user (mathematician, engineer, programmer), as she understands that the value of the geometric mean is zero. If this causes serious problems with the interpretation of the results or breaks the computer program, then, firstly, it is possible that the choice of the geometric mean value was not justified as a mathematical model.


Python

As mentioned in other answers, python has infinity. When np.exp(np.log(0)) is executed, it causes a warning about execution (division by zero), but the result of the operation is correct.

+6
source

Whether 0 the correct result depends on what you are trying to accomplish. ptrj did a great job with their answer, so I will add only one thing to consider.

You might want to use the geometric mean adjusted by epsilon. While the standard geometric mean is (a_1*a_2*...*a_n)^(1/n) , the geometric mean given by epsilon is ( (a_1+e)*(a_2+e)*...*(a_n+e) )^(1/n) - e . The appropriate value for epsilon ( e ) again depends on your task.

Epsilon-corrected geometries are sometimes used in data mining, where the 0 in the set should not cause the record to disappear completely, although it should still penalize a record score. See, for example, Methods for aggregating scores in search experiments .

For example, with your data and setting epsilon 0.01

 >>> from operator import mul >>> pn=[0.4, 0.3, 0, 0] >>> e=0.01 >>> pow(reduce(mul, [x+e for x in pn], 1), 1./len(pn)) - e 0.04970853116594962 
+2
source

You should return -math.inf in python 3.5 or -float('inf') in older versions. This is due to the fact that the logarithm of numbers very close to 0 goes into negative infinity. This is a float value while maintaining the correct inequality between the sum of the logs between the lists, for example, one would expect

 sumlog([5, 4, 1, 0, 2]) < sumlog([5, 1, 4, 0.0001, 1]) 

This inequality persists if you return negative infinity.

0
source

You can try using lists in Python. They can be very powerful for customizing how you process your data. This example uses list comprehension and error number -999 .

 >>> [math.log(i) if i > 0 else -999 for i in pn] >>> [-0.916290731874155, -1.2039728043259361, -999, -999] 

If you use only if , not else , then if comes after the for i in pn .

0
source

Source: https://habr.com/ru/post/1246441/


All Articles