Is there a way to guess the size of data.frame based on rows, columns and variable types?

I expect to generate a lot of data and then intercept their R. How can I estimate the size of data.frame (and therefore the required memory) by the number of rows, the number of columns and the types of variables?

Example.

If I have 10,000 rows and 150 columns, of which 120 are numeric, 20 are row and 10 are factor level, what size of data frame can I expect? Will the results change depending on the data stored in the columns (e.g. max(nchar(column)))?

> m <- matrix(1,nrow=1e5,ncol=150)
> m <- as.data.frame(m)
> object.size(m)
120009920 bytes
> a=object.size(m)/(nrow(m)*ncol(m))
> a
8.00066133333333 bytes
> m[,1:150] <- sapply(m[,1:150],as.character)
> b=object.size(m)/(nrow(m)*ncol(m))
> b
4.00098133333333 bytes
> m[,1:150] <- sapply(m[,1:150],as.factor)
> c=object.size(m)/(nrow(m)*ncol(m))
> c
4.00098133333333 bytes
> m <- matrix("ajayajay",nrow=1e5,ncol=150)
> 
> m <- as.data.frame(m)
> object.size(m)
60047120 bytes
> d=object.size(m)/(nrow(m)*ncol(m))
> d
4.00314133333333 bytes
+4
source share
3 answers

You can simulate an object and calculate the memory estimate that is used to store it as an R object using object.size:

m <- matrix(1,nrow=1e5,ncol=150)
m <- as.data.frame(m)
m[,1:20] <- sapply(m[,1:20],as.character)
m[,29:30] <- sapply(m[,29:30],as.factor)
object.size(m)
120017224 bytes
print(object.size(m),units="Gb")
0.1 Gb
+10

pryr. object_size, . R

, object.size(), .

attributes, ..

object.size(attributes(m))
+3

, , .

object.size(), .

+2

Source: https://habr.com/ru/post/1599341/


All Articles