What is integer overflow in R and how can this happen?

Question

What is integer overflow in R and how can this happen?

I have some calculations and you will get the following warning (i.e. not an error):

Warning messages: 1: In sum(myvar, na.rm = T) : Integer overflow - use sum(as.numeric(.))

In this thread, people argue that whole overflows just don't happen. Either R is not too modern, or they are wrong. However, what should I do here? If I use as.numeric as a warning, I can ignore the fact that the information was lost before. myvar reading the CSV file form, so shouldn't R indicate that a larger field is required? Has it cut off something already?

What is the maximum integer or numeric length? Would you suggest any other type / mode of field?

EDIT: I run:

R version 2.13.2 (2011-09-30) Platform: x86_64-apple-darwin9.8.0 / x86_64 (64-bit) in R Studio

+24

integer r numeric overflow

Matt Bannert Jan 10 2018-12-01T00:

source share

3 answers

In short, integer is an exact type with a limited range, and numeric is a floating point type that can represent a much wider range of values, but is inaccurate. See the help pages ( ?integer and ?numeric ) for more information.

Regarding overflow, here is a description of Brian D. Ripley:

This means that you take the average value [in your case, the sum is @aix] of some very large integers and the calculation overflows. This is just a warning.
This will not happen in the next version of R.

You can indicate that a number is an integer by specifying its suffix L , for example, 1L is an integer, not 1 , which is a floating point, with the class "numeric" .

The largest integer that you can create on your computer is given by .Machine$integer.max .

 > .Machine$integer.max [1] 2147483647 > class(.Machine$integer.max) [1] "integer"

Adding a positive integer to this overflows, returning NA .

 > .Machine$integer.max + 1L [1] NA Warning message: In .Machine$integer.max + 1L : NAs produced by integer overflow > class(.Machine$integer.max + 1L) [1] "integer"

You can get around this limitation by adding floating point values instead.

 > .Machine$integer.max + 1 [1] 2147483648 > class(.Machine$integer.max + 1) [1] "numeric"

Since in your case a warning is issued sum , this means that overflow occurs when numbers are added together. The proposed workaround sum(as.numeric(.)) Should do the trick.

+16

NPE Jan 10 2018-12-12T00:

source share

What is the maximum length of an integer or numerical?

Vectors are currently indexed with an integer, so the maximum length is given by .Machine$integer.max . As DWin noted, all versions of R currently use 32-bit integers, so it will be 2^31 - 1 or just over 2 billion.

If you are not packing some serious equipment (or you are reading this in the future, hello from 2012), you will not have enough memory to select vectors that take a long time.

I remember a discussion in which R-core (Brian Ripley, I think) suggested that the next step could be to index vectors with the doubling mantissa or something smart that effectively gives a 48-bit index. Unfortunately, I can not find this discussion.

In addition to the Rmpfr package, if you experience integer overflow, you can try the int64 package.

+3

Richie Cotton Jan 10 2018-12-12T00:

source share

42 - · Accepted Answer · 2012-01-10 14:37

You can answer many of your questions by reading the ?integer man page. It says:

R uses 32-bit integers for integer vectors, so the range represented by integers is limited to about +/- 2 * 10 ^ 9.

Extending to large integers is being considered by R Core, but it will not happen in the near future.

If you want to install the “bignum” capacity, install the Martin Maechler Rmpfr Package [PDF]. I recommend the "Rmpfr" package because of the reputation of the author. Martin Maechler is also actively involved in the development of the Matrix package, as well as in R Core. Alternatives exist, including arithmetic packages such as "gmp", "Brobdingnag" and "Ryacas" (the latter also offers a symbolic mathematical interface).

Then, to answer the criticisms in the answer you referred to, and how to evaluate the relevance of your work, consider this: if there were the same statistical functions available in one of these “modern” languages, since it exists in R, you will probably see the user migrate in this direction. But I would say that migration, and, of course, growth, is now in the direction of R. R was created by statisticians for statistics.

At one time there was a variant of Lisp with the statistical package Xlisp-Stat, but its main developer and supporter is now a member of R-Core. On the other hand, one of R's early developers, Ross Ihaka, offers to work on development in Lisp - as a language [PDF]. There is a compiled language called Clojure (pronounced as the English say "closure") with an experimental interface, Rincanter.

Update:

Newer versions of R (3.0. +) Have 53-bit integers (using the numeric mantissa). When a "integer" vector element is assigned a value greater than ".Machine $ integer.max", the whole vector is bound to "numeric", aka "double". The maximum value for integers remains the same as it was, however, it may be forcing integer vectors to double in order to maintain accuracy in cases that would previously generate overflow. Unfortunately, the lengths of lists, matrix and array sizes, and vectors are still set to integer.max .

When reading in large values from files, it is probably safer to use a character-class as a target and then manipulate. If there is enforcement of NA values, there will be a warning.

What is integer overflow in R and how can this happen?

Update:

More articles: