Why does the access time for the first data.frame element depend on its size?

I'm having difficulty accessing the first element of a data.frame . Access time seems to depend on the size of the data.frame . Does anyone know how to eliminate this addiction?

This is an example of the code that I executed. It allocates "tme", which saves the time needed to set the first data.frame element to length i*1000 , where i runs from 1 to 500. In fact, I select longer and longer data.frames in steps of 1000 and set the first element is zero. In short, data.frames access time is much lower than measurability, they increase to several seconds in long arrays.

 tme <- (1:500) for (j in 1:500){ i <- j*1000 vec <- (1:(i*1000)) print(i) now <- Sys.time() vec[1] <- 0 tme[j] <- Sys.time()-now } tme_vec_first <- tme 
+3
source share
1 answer

I do not think that the increase in time is associated with access time, but rather is associated with copying. Each of these settings includes creating a copy of the vector. You can check this with tracemem .

 # initialize vector (10 zeros) tracemem({vec <- integer(10)}) 

[1] "<0000000011D48720>"

 # assign value to 7th position tracemem({vec[7] <- 6L}) 

tracemem [0x0000000011d48720 → 0x00000000111a02b0]:
[1] "<0000000012E25468>"

As the vector increases, the time spent on the copy process increases.


Also, note that vec <- (1:(i*1000)) is an integer vector, and vec[1] <- 0 turns vec into a double vector, which roughly doubles the size of the vector in memory.

First we create an integer vector and check its size and type.

 # start over with similar syntax to question tracemem({vec <- 1:10}) 

[1] "<0000000011E55508>"

# check the size of object.size (VEC)

88 bytes

 # check type typeof(vec) 

[1] "integer"

Now assign 0 to the 7th position and double-check the size and type. 0 appears to be the same value as the original value, but is actually double, not integer.

 # assign value tracemem({vec[7] <- 0}) 

tracemem [0x0000000011e55508 → 0x0000000012399390]:
tracemem [0x0000000012399390 → 0x0000000013394740]:
[1] "<00000000130EBA60>"

 # check size object.size(vec) 

168 bytes

 # check type typeof(vec) 

[1] "double"

Note that there are two separate copy commands. I assume that the first is a copy for converting a vector from a whole to a double, and the second is a destination.

To save the vector as an integer vector, use vec[1] <- 0L instead, since "L" tells R that an integer is required.


Note that this tracemem copy tracemem is observed with both Rstudio and Rgui when using MS R open 3.2.5 with windows 7.

+6
source

Source: https://habr.com/ru/post/1015412/


All Articles