How to work with zero log in R in image.plot?

I have a matrix, and all entries are probabilities. Most entries have a very low probability. Some have zeros. I need to make a matrix log. However, since there are zeros in the matrix, R generates -inf for these null entries. My goal is to pass this log (matrix) to image.plot (). When I write this in image.plot, I kept getting this error:

Error in seq.default(minz + binwidth/2, maxz - binwidth/2, by = binwidth) : invalid (to - from)/by in seq(.) 

Is there any solution that can help me get around this?

Here's what the matrix looks like:

  0 1 2 3 4 5 6 [1,] -0.0007854138 -8.9132811 -10.011893 -10.705041 -9.606428 -9.318746 -Inf [2,] -0.3402118357 -1.6137090 -2.742625 -4.215836 -5.721434 -7.121522 -9.606428 [3,] -0.2912175507 -2.0296478 -3.521929 -4.275321 -4.426519 -4.187369 -3.715705 [4,] -1.5244380532 -0.7048802 -2.001368 -3.405243 -3.713864 -3.143919 -3.781412 [5,] -0.7572491288 -0.7487709 -3.981208 -5.110329 -5.228577 -5.095569 -5.293395 [6,] -0.0007629648 -Inf -8.759130 -7.613998 -9.606428 -Inf -Inf [7,] -0.0020658381 -7.4861648 -7.526987 -7.094123 -9.318746 -Inf -Inf [8,] -0.0295715883 -6.7160566 -7.208533 -6.610696 -6.485533 -6.813220 -6.387552 [9,] -0.0032128722 -6.7160566 -7.613998 -7.871827 -7.760602 -8.759130 -8.759130 [10,] -0.4869248130 -1.3225132 -2.518576 -3.768698 -5.140520 -6.183252 -7.208533 7 8 9 [1,] -Inf -10.705041 -10.011893 [2,] -Inf -Inf -7.149693 [3,] -4.965248 -5.968842 -6.428374 [4,] -4.696227 -5.091913 -4.669559 [5,] -5.163777 -5.468599 -6.577906 [6,] -Inf -Inf -Inf [7,] -Inf -Inf -Inf [8,] -6.627503 -6.456545 -6.400976 [9,] -10.011893 -10.011893 -Inf [10,] -8.402456 -7.814669 -6.546158 

Here is the structure:

 structure(c(0.999214894571557, 0.71161956034096, 0.747353073126963, 0.217743382682817, 0.468954688200987, 0.999237326155227, 0.997936294302378, 0.970861372812921, 0.996792283535218, 0.614513234634365, 0.000134589502018843, 0.199147599820547, 0.13138178555406, 0.49416778824585, 0.472947510094213, 0, 0.000560789591745177, 0.00121130551816958, 0.00121130551816958, 0.266464782413638, 4.48631673396142e-05, 0.0644010767160162, 0.0295423956931359, 0.135150291610588, 0.0186630776132795, 0.00015702108568865, 0.00053835800807537, 0.000740242261103634, 0.000493494840735756, 0.0805742485419471, 2.24315836698071e-05, 0.0147599820547331, 0.0139075818752804, 0.0331987438313145, 0.00603409600717811, 0.000493494840735756, 0.000829968595782862, 0.00134589502018843, 0.000381336922386721, 0.0230820995962315, 6.72947510094213e-05, 0.00327501121579183, 0.0119560340960072, 0.0243831314490803, 0.00536114849708389, 6.72947510094213e-05, 8.97263346792284e-05, 0.00152534768954688, 0.000426200089726335, 0.00585464333781965, 8.97263346792284e-05, 0.000807537012113055, 0.0151861821444594, 0.0431135038133692, 0.00612382234185734, 0, 0, 0.00109914759982055, 0.00015702108568865, 0.00206370569762225, 0, 6.72947510094213e-05, 0.0243382682817407, 0.022790489008524, 0.00502467474203679, 0, 0, 0.00168236877523553, 0.00015702108568865, 0.000740242261103634, 0, 0, 0.00697622252131, 0.00912965455361149, 0.00572005383580081, 0, 0, 0.00132346343651862, 4.48631673396142e-05, 0.000224315836698071, 2.24315836698071e-05, 0, 0.00255720053835801, 0.00614625392552714, 0.00421713772992373, 0, 0, 0.0015702108568865, 4.48631673396142e-05, 0.000403768506056528, 4.48631673396142e-05, 0.000785105428443248, 0.00161507402422611, 0.00937640197397936, 0.00139075818752804, 0, 0, 0.00165993719156572, 0, 0.00143562135486765), .Dim = c(10L, 10L), .Dimnames = list(NULL, c("0", "1", "2", "3", "4", "5", "6", "7", "8", "9"))) 
+6
source share
5 answers

True, a log chart can make "the difference between the elements more noticeable." However, if you have zeros in your data, you will use them incorrectly. The point of the logarithmic scale is an illustration of the exponential increase in data. However, zeros mean that:

  • observed values ​​were not obtained by process, exponential growth or
  • you need to handle your missing values ​​differently.

In any case, what would be much better in your case is to take the square root of the values. Or (n> 2) -th root, if you want to emphasize the difference in values ​​even more - the higher the value of n, the greater the difference.

Following @flodel's suggestion below, the code that will do this is: image.plot(sqrt(x)) or, more generally, image.plot(x^(1/n)) for some n>1 .

Hope this helps.

+2
source

If these zeros are caused by a physical measurement that should give positive definite results but cannot do so for technical reasons, it may be wise to replace 1/2 of the lower detection limit for zeros.

  M2 <- M print( min(M[M!=0]), digits=16) #[1] 2.24315836698071e-05 M2[M2==0] <- 0.5*min(M[M!=0]) image(M2) image(log(M2)) 

enter image description here

+4
source

A simple trick is to add 1, because log1 = 0, so that a cell with 0 will still have 0 after log conversion.

  k<-matrix(c(1:8,0,0),nrow=2,ncol=5) > k [,1] [,2] [,3] [,4] [,5] [1,] 1 3 5 7 0 [2,] 2 4 6 8 0 log(k) [,1] [,2] [,3] [,4] [,5] [1,] 0.0000000 1.098612 1.609438 1.945910 -Inf [2,] 0.6931472 1.386294 1.791759 2.079442 -Inf log(k+1) [,1] [,2] [,3] [,4] [,5] [1,] 0.6931472 1.386294 1.791759 2.079442 0 [2,] 1.0986123 1.609438 1.945910 2.197225 0 
+2
source

Except, seq() selected, which cannot accept -inf like any of its arguments. You can get exactly the same type of error with the following code:

 > seq(-log(0), 0, 50) Error in seq.default(-log(0), 0, 50) : invalid (to - from)/by in seq(.) 

To avoid this, do the @Metrics trick. Although I suggest instead of adding 1.0 to add a very small value, for example 1e-22, since your matrix is ​​a probability matrix.

+2
source

It is not possible to insert multiple comments into the code, but this example shows what I meant:

 > m=cbind(c(0,0.88,0.99),c(1,2,1),c(3,4,5)) > m=as.matrix(m) > log(m) [,1] [,2] [,3] [1,] -Inf 0.0000000 1.098612 [2,] -0.12783337 0.6931472 1.386294 [3,] -0.01005034 0.0000000 1.609438 > m [,1] [,2] [,3] [1,] 0.00 1 3 [2,] 0.88 2 4 [3,] 0.99 1 5 
0
source

Source: https://habr.com/ru/post/952166/


All Articles