Why are some memory addresses reported constant while others change?

I tried to track various objects in memory using data.table::address or .Internal(address()) , but noticed that some objects return the same address every time, while others are almost always different. What's going on here?

I noticed that the addresses of objects such as lists (data.tables, data.frames, etc.) remain constant (as reported by these functions), whereas if I try to report the address [ to the list, i.e. address(lst[1]) I get different results almost every time. On the other hand, lst[[1]] returns the same value, and the addresses of constants like address(pi) remain constant, and address(1) are mutable. Why is this happening?

 ## Create some data.tables of different sizes and plot the addresses library(data.table) par(mfrow = c(2,2)) for (i in 2:5) { dat <- data.table(a=1:10^i) ## Constants addr1 <- address(dat) addr2 <- address(dat[[1]]) addr3 <- address(dat$a) # same as addr2 ## Vary addrs <- replicate(5000, address(dat[1])) plot(density(as.integer(as.hexmode(addrs))), main=sprintf("N: %g", nrow(dat))) abline(v=as.integer(as.hexmode(c(addr1, addr2, addr3))), col=1:3, lwd=2, lty=1:3) legend("topleft", c("dat", "dat[[1]]", "dat$a"), col=1:3, lwd=2, lty=1:3) } 

Here are some examples of what I'm talking about various data.tables data. This is simply the density of the results from address(dat[1]) (converted to an integer), and the strings correspond to the constant addresses of the data table.

enter image description here

+5
source share
1 answer

Firstly, I can replicate your results, so I did a little investigation and dived through some kind of code.

When you access the first dat element using dat[1] , you actually create a slice from list to data[[1]] or dat$a . To take a slice, R first copies the list, and then returns the desired fragment.

So - basically - you see what you see, because the syntax [] for indexing returns a slice containing the first dat element , which is a copy of dat$a , which will be in a random location memory.

The syntax [[]] returns a link to the actual list, which is a column in your data.table or data.frame , and therefore its address is invariant (or at least until you change the member of this list )

This can be confusing because of course dat[1] = 6 or the like will change the value of the list in your data structure. However, if you look at address(dat[[1]]) before and after making such a change, you will notice that in fact the link now refers to another list (copy), for example.

 > dat <- data.table(a=1:10000) > dat a 1: 1 2: 2 3: 3 4: 4 5: 5 --- 9996: 9996 9997: 9997 9998: 9998 9999: 9999 10000: 10000 > address(dat[[1]]) [1] "000000000CF389D8" > address(dat[[1]]) [1] "000000000CF389D8" > dat[1] = 100 > address(dat[[1]]) [1] "000000000D035B38" > dat a 1: 100 2: 2 3: 3 4: 4 5: 5 --- 9996: 9996 9997: 9997 9998: 9998 9999: 9999 10000: 10000 > 

Considering the source code for data.frame (rather than data.table ), the code that indexes the fragment ( [] ) is here , while the direct indexing ( [[]] ) is here . You can see that the latter is simpler and shorten the long history, the former returns a copy. If you change the slice directly (for example, dat[1] = 5 ), there is logic here that can guarantee that the data frame now refers to the updated copy.

+3
source

Source: https://habr.com/ru/post/1235762/


All Articles