Rdata file size compared to csv

Question

Rdata file size compared to csv

My .Rdata file is 92 MB in size.

However, the source csv file is about 3 GB. I included it with the usual read.csv()

How can it be?

+4

r csv rdata

Rico Jun 04 '13 at 14:00

source share

1 answer

Dirk eddelbuettel · Accepted Answer · 2013-06-04T14:16:12+0000

Comments have already hinted at what is happening. But it is so simple, let's make an example:

 R> X <- 1:1e5 # data, no repeats R> save(X, file="/tmp/foo.RData") R> write.csv(X, file="/tmp/foo.csv") R> system("ls -l /tmp/foo*") -rw-r--r-- 1 xy 1377797 Jun 4 09:11 /tmp/foo.csv -rw-r--r-- 1 xy 212397 Jun 4 09:11 /tmp/foo.RData

Now with data that repeats:

 R> X <- rep(1,1e5) # data, lots of repeats R> write.csv(X, file="/tmp/bar.csv") R> save(X, file="/tmp/bar.RData") R> system("ls -lh /tmp/bar*") -rw-r--r-- 1 xy 966K Jun 4 09:12 /tmp/bar.csv -rw-r--r-- 1 xy 1.3K Jun 4 09:12 /tmp/bar.RData R>

Thus, we get coefficients from 6.5 to 743 depending on how well it compresses. And this is before we make csv more "expensive" by forcing to print a few decimal places ...

Rdata file size compared to csv

More articles: