How to calculate the logarithm of 1 minus the exponent of a given small number in python

I am doing a probability calculation. I have a lot of very small numbers, all of which I want to subtract from 1 and do it for sure. I can accurately calculate the logarithm of these small numbers. My strategy so far has been this (using numpy):

Given the array of small x numbers, calculate:

 y = numpy.logaddexp.reduce(x) 

Now I want to compute something like 1-exp(y) or even better than log(1-exp(y)) , but I'm not sure how to do this without losing my accuracy.

In fact, even the logaddexp function works in precision tasks. The values ​​in the vector x can vary from -2 to -800 or even more negative. The vector y on top will have a whole series of numbers around 1e-16, which is a data type eps . So, for example, accurately calculated data may look like this:

 In [358]: x Out[358]: [-5.2194676211172837, -3.9050377656308362, -3.1619783292449615, -2.71289594096134, -2.4488395891021639, -2.3129210706827568, -2.2709987626652346, -2.3007776073511259, -2.3868404149802434, -2.5180718876609163, -2.68619816583087, -2.8849022632856958, -3.1092603032627686, -3.3553673369747834, -3.6200806272462351, -3.9008385919463073, -4.1955300857178379, -4.5023981074719899, -4.8199676154248081, -5.1469905756384904, -5.4824035553480428, -5.8252945959126876, -6.174877049340779, -6.5304687083067563, -6.8914750074202473, -7.25737538919104, -7.6277121540338797, -8.0020812775389558, -8.3801247986220773, -8.7615244716292437, -9.1459964426584435, -9.5332867613176404, -9.9231675781398394, -10.315433907978701, -10.709900863130784, -11.106401278287066, -11.50478366390567, -11.904910436107656, -12.30665638039909, -12.709907313918777, -13.114558916892051, -13.52051570882999, -13.927690148982549, -14.336001843810081, -14.745376846921289, -15.155747039147968, -15.567049578271309, -15.979226409456359, -16.39222382873956, -16.805992092998878, -17.22048507074976, -17.63565992888303, -18.051476851117201, -18.467898784496384, -18.884891210740903, -19.302421939667397, -19.720460922243518, -20.138980081145718, -20.557953156947775, -20.977355568292495, -21.397164284594595, -21.817357709992422, -22.237915577412224, -22.658818851739369, -23.080049641202237, -23.501591116172762, -23.923427434676114, -24.345543673975158, -24.767925767665417, -25.190560447772668, -25.61343519140047, -26.036538171518259, -26.459858211524278, -26.883384743252066, -27.307107768123842, -27.731017821180984, -28.155105937748402, -28.579363622513654, -29.003782820820732, -29.428355891997484, -29.853075584553352, -30.27793501309668, -30.702927636836705, -31.128047239545907, -31.553287910869187, -31.978644028878307, -32.404110243774596, -32.82968146265631, -33.255352835270173, -33.681119740674262, -34.106977774747804, -34.532922738484046, -34.958950627012712, -35.385057619298891, -35.811240068471022, -36.237494492735493, -36.663817566835519, -37.090206114019054, -37.516657098479527, -37.943167618239784, -38.369734898447348, -38.796356285056333, -39.223029238868548, -39.64975132991276, -40.076520232137909, -40.5033337184027, -40.930189655741344, -41.357086000888444, -41.784020796047173, -42.210992164885965, -42.637998308748706, -43.065037503066776, -43.492108093959985, -43.919208495015312, -44.346337184233221, -44.773492701130749, -45.200673643993753, -45.627878667267964, -46.055106479082156, -46.482355838895614, -46.909625555262096, -47.336914483704675, -47.764221524695017, -48.191545621730768, -48.618885759506213, -49.04624096217151, -49.473610291673936, -49.900992846179292, -50.328387758566748, -50.755794194994508, -51.183211353532613, -51.610638462858901, -52.0380747810147, -52.46551959421754, -52.892972215728378, -53.320431984769073, -53.747898265489198, -54.175370445978274, -54.602847937323247, -55.030330172705362, -55.457816606538813, -55.885306713645889, -56.312799988467418, -56.740295944308855, -57.167794112617116, -57.59529404228897, -58.02279529900909, -58.450297464615232, -58.877800136490578, -59.305302926981085, -59.732805462838542, -60.160307384683506, -60.587808346493375, -61.015308015110463, -61.442806069768608, -61.87030220164138, -62.297796113406662, -62.725287518829532, -63.15277614236129, -63.580261718755196, -64.007743992695964, -64.435222718445743, -64.862697659501919, -65.290168588270035, -65.717635285748088, -66.14509754122389, -66.572555151982783, -67.000007923029216, -67.427455666815376, -67.854898202982099, -68.282335358110231, -68.709766965479957, -69.137192864839108, -69.564612902180784, -69.992026929530198, -70.419434804735829, -70.8468363912732, -71.274231558051156, -71.701620179229167, -72.129002134037705, -72.556377306608397, -72.983745585807242, -73.411106865077045, -73.838461042282461, -74.265808019561746, -74.693147703185559, -75.120480003416901, -75.547804834380145, -75.97512211393132, -76.402431763534764, -76.829733708143749, -77.257027876085431, -77.684314198948414, -78.111592611476681, -78.538863051464546, -78.966125459656723, -79.393379779652037, -79.820625957809625, -80.24786394315754, -80.675093687306912, -81.102315144366912] 

Then I try to calculate the sum of the metrics log:

 In [359]: np.logaddexp.accumulate(x) Out[359]: array([ -5.21946762e+00, -3.66710221e+00, -2.68983273e+00, -2.00815067e+00, -1.51126604e+00, -1.14067818e+00, -8.60829425e-01, -6.48188808e-01, -4.86276416e-01, -3.63085873e-01, -2.69624488e-01, -1.99028599e-01, -1.45996863e-01, -1.06408884e-01, -7.70565672e-02, -5.54467248e-02, -3.96506186e-02, -2.81859503e-02, -1.99225261e-02, -1.40061296e-02, -9.79701394e-03, -6.82045164e-03, -4.72733966e-03, -3.26317960e-03, -2.24396350e-03, -1.53767347e-03, -1.05026994e-03, -7.15209142e-04, -4.85690052e-04, -3.28980607e-04, -2.22305294e-04, -1.49890553e-04, -1.00858788e-04, -6.77380054e-05, -4.54139175e-05, -3.03974537e-05, -2.03154477e-05, -1.35581905e-05, -9.03659252e-06, -6.01552344e-06, -3.99984336e-06, -2.65671945e-06, -1.76283376e-06, -1.16860435e-06, -7.73997496e-07, -5.12213574e-07, -3.38706792e-07, -2.23809375e-07, -1.47785898e-07, -9.75226648e-08, -6.43149957e-08, -4.23904687e-08, -2.79246430e-08, -1.83858489e-08, -1.20995365e-08, -7.95892319e-09, -5.23300609e-09, -3.43929670e-09, -2.25953475e-09, -1.48391255e-09, -9.74194956e-10, -6.39351406e-10, -4.19466218e-10, -2.75121795e-10, -1.80397409e-10, -1.18254918e-10, -7.74993004e-11, -5.07775611e-11, -3.32619009e-11, -2.17835737e-11, -1.42634249e-11, -9.33764336e-12, -6.11190167e-12, -3.99989955e-12, -2.61737204e-12, -1.71253165e-12, -1.12043465e-12, -7.33052079e-13, -4.79645919e-13, -3.13905885e-13, -2.05519681e-13, -1.34650094e-13, -8.83173582e-14, -5.80300378e-14, -3.82338678e-14, -2.52963381e-14, -1.68421145e-14, -1.13181549e-14, -7.70918073e-15, -5.35155125e-15, -3.81152630e-15, -2.80565548e-15, -2.14872312e-15, -1.71971577e-15, -1.43957518e-15, -1.25665732e-15, -1.13722927e-15, -1.05925916e-15, -1.00835857e-15, -9.75131524e-16, -9.53442707e-16, -9.39286186e-16, -9.30046550e-16, -9.24016349e-16, -9.20080954e-16, -9.17512772e-16, -9.15836886e-16, -9.14743318e-16, -9.14029759e-16, -9.13564174e-16, -9.13260398e-16, -9.13062204e-16, -9.12932898e-16, -9.12848539e-16, -9.12793505e-16, -9.12757603e-16, -9.12734183e-16, -9.12718905e-16, -9.12708939e-16, -9.12702438e-16, -9.12698198e-16, -9.12695432e-16, -9.12693627e-16, -9.12692451e-16, -9.12691683e-16, -9.12691183e-16, -9.12690856e-16, -9.12690643e-16, -9.12690504e-16, -9.12690414e-16, -9.12690355e-16, -9.12690316e-16, -9.12690291e-16, -9.12690275e-16, -9.12690264e-16, -9.12690257e-16, -9.12690252e-16, -9.12690249e-16, -9.12690248e-16, -9.12690246e-16, -9.12690245e-16, -9.12690245e-16, -9.12690245e-16, -9.12690244e-16, -9.12690244e-16, -9.12690244e-16, -9.12690244e-16, -9.12690244e-16, -9.12690244e-16, -9.12690244e-16, -9.12690244e-16, -9.12690244e-16, -9.12690244e-16, -9.12690244e-16, -9.12690244e-16, -9.12690244e-16, -9.12690244e-16, -9.12690244e-16, -9.12690244e-16, -9.12690244e-16, -9.12690244e-16, -9.12690244e-16, -9.12690244e-16, -9.12690244e-16, -9.12690244e-16, -9.12690244e-16, -9.12690244e-16, -9.12690244e-16, -9.12690244e-16, -9.12690244e-16, -9.12690244e-16, -9.12690244e-16, -9.12690244e-16, -9.12690244e-16, -9.12690244e-16, -9.12690244e-16, -9.12690244e-16, -9.12690244e-16, -9.12690244e-16, -9.12690244e-16, -9.12690244e-16, -9.12690244e-16, -9.12690244e-16, -9.12690244e-16, -9.12690244e-16, -9.12690244e-16, -9.12690244e-16, -9.12690244e-16, -9.12690244e-16, -9.12690244e-16, -9.12690244e-16, -9.12690244e-16, -9.12690244e-16, -9.12690244e-16, -9.12690244e-16, -9.12690244e-16, -9.12690244e-16, -9.12690244e-16, -9.12690244e-16, -9.12690244e-16, -9.12690244e-16]) 

This ultimately leads to:

 In [360]: np.logaddexp.reduce(x) Out[360]: -9.1269024387687033e-16 

therefore my accuracy has already been destroyed. Any ideas on how to get around this?

+6
source share
6 answers

In Python 2.7, we added math.expm1 () for this use case:

 >>> from math import exp, expm1 >>> exp(1e-5) - 1 # gives result accurate to 11 places 1.0000050000069649e-05 >>> expm1(1e-5) # result accurate to full precision 1.0000050000166668e-05 

In addition, math.fsum () for the sum step without loss of precision:

 >>> sum([.1, .1, .1, .1, .1, .1, .1, .1, .1, .1]) 0.9999999999999999 >>> fsum([.1, .1, .1, .1, .1, .1, .1, .1, .1, .1]) 1.0 

Finally, if all else fails, you can use a decimal module that supports ultra-high precision arithmetic:

 >>> from decimal import * >>> getcontext().prec = 200 >>> (1 - 1 / Decimal(7000000)).ln() Decimal('-1.4285715306122546161332083855139723669559469615692284955124609122046580004888309867906750714454869716398919778588515625689415322789136206397998627088895481989036005482451668027002380442299229191323673E-7') 
+7
source

I suggest replacing exp () and log () with their Taylor series in neighborhoods 0 and 1, respectively. This way you won’t lose accuracy using large numbers (mine, I just called 1 a large number: ^). Use the Lagrange else formula or just a member expression with some reserve to determine, since the mismatch will be beyond your accuracy.

Update:

Python 2.7 math.expm1 ( exp(x)-1 ) and math.log1p ( log(1+x) ) do this for you if the accuracy of the C library (usually double) is enough. (if not, you will have to resort to special mathematical software (x86 FPU can calculate with advanced accuracy)).

+3
source

I don’t know much about python, I do most of my work in Java, but it seems to me that you'd better do the log-sum-exp trick on all values ​​first, by-two, using numpy.logaddexp.accumulate.

A quick google search reveals a candidate among python libraries: scipy.misc.logsumexp.

In any case, it will not be difficult to program yourself:

 logsumexp(probs) := max(probs) + log(sum[i](exp(probes[i]-max(probs)))) 

so something like:

  maxValue = -Inf; for x in probs if x > maxValue then maxValue = x expSum = 0 for x in probs expSum += exp(x - maxValue) return log(expSum) 

A single return value, say p, is just the logarithm of the sum of all probabilities in traffic jams. Please note that if there is a large scale between the higher and lower values ​​of the input probabilities, the small ones will be ignored, but only if their contribution is very small compared to the large numbers, which in most applications should be accurate.

You can use more complex strategies to calculate these small values ​​if there are a large number of small numbers, for example, probe = 0.5 + 1E-7 + 1E-7 ... millions of times, so add up to 0.1. What you can do is either split the individual amounts into several, where values ​​of approximately the same scale are added together before merging. or you can use some average rotation value instead of max, but you have to make sure that the largest value is not too large, because then exp (probs [i] - pivot) will lead to overflow in this case.

Once this is done, you still need to do the calculation log (1-exp (p))

To do this, I found this document describing a way to avoid the greatest possible loss of accuracy using the logistic functions that can be found in most common language libraries.

Maechler M, Accurately Computing log (1 - exp (- | a |)), rated by Rmpfr 2012

The key should use one of two possible approaches depending on the input value a.

Definitions:

 log1mexp(a) := log(1-exp(a)) ### function that we seek to implement. log1p(a) := log(1+a) # function provided by common math libraries. exp1m(a) := exp(a) - 1 # function provided by common math libraries. 

There is an obvious way to implement log1mexp using log1p:

 log1mexp(a) := log1p(-exp(a)) 

With exp1m you can do:

 log1mexp(a) := log(-expm1(a)) 

Then you should use the log1p approach when <log (.5) and expm1 when a> = log (.5).

 log1mexp(a) := (a < log(.5)) ? log1p(-exp(a)) : log(-expm1(a)). 

See the external link for more information.

+3
source

I'm not sure if this is what you want

 numpy.expm1(x[, out]) Calculate exp(x) - 1 for all elements in the array. >>> import numpy as np >>> np.expm1(x).sum() -200.0 >>> (-np.expm1(x)).sum() 200.0 >>> from scipy import special >>> (-special.expm1(x)).sum() 200.0 >>> np.log((-special.expm1(x)).sum()) 5.2983173665480363 

Edit:

Sorry, I didn’t understand that this is just a null version of Raymond Hettinger's answer.

(not an answer to a numerical problem)

I'm still not sure what the question is, however, instead of throwing decimals or mpmath into it, perhaps reformulating the problem will help. For example, if you add probabilities to Poisson, you will ultimately always be close to 1. But for some questions we can avoid some problems related to the survival function, not cdf.

+2
source

I used mpmath for similar problems. This is a very good and well-documented 100% python library. If speed is a problem for you; consider using mpmath with gmpy . See the link for this.

+1
source

Just like Raymond Hettinger said, but of course you will need to multiply by -1 because you want 1 - exp instead of exp - 1

0
source

Source: https://habr.com/ru/post/902606/


All Articles