Rdata file size compared to csv

My .Rdata file is 92 MB in size.

However, the source csv file is about 3 GB. I included it with the usual read.csv()

How can it be?

+4
source share
1 answer

Comments have already hinted at what is happening. But it is so simple, let's make an example:

 R> X <- 1:1e5 # data, no repeats R> save(X, file="/tmp/foo.RData") R> write.csv(X, file="/tmp/foo.csv") R> system("ls -l /tmp/foo*") -rw-r--r-- 1 xy 1377797 Jun 4 09:11 /tmp/foo.csv -rw-r--r-- 1 xy 212397 Jun 4 09:11 /tmp/foo.RData 

Now with data that repeats:

 R> X <- rep(1,1e5) # data, lots of repeats R> write.csv(X, file="/tmp/bar.csv") R> save(X, file="/tmp/bar.RData") R> system("ls -lh /tmp/bar*") -rw-r--r-- 1 xy 966K Jun 4 09:12 /tmp/bar.csv -rw-r--r-- 1 xy 1.3K Jun 4 09:12 /tmp/bar.RData R> 

Thus, we get coefficients from 6.5 to 743 depending on how well it compresses. And this is before we make csv more "expensive" by forcing to print a few decimal places ...

+4
source

Source: https://habr.com/ru/post/1484419/


All Articles