Comments have already hinted at what is happening. But it is so simple, let's make an example:
R> X <- 1:1e5 # data, no repeats R> save(X, file="/tmp/foo.RData") R> write.csv(X, file="/tmp/foo.csv") R> system("ls -l /tmp/foo*") -rw-r--r-- 1 xy 1377797 Jun 4 09:11 /tmp/foo.csv -rw-r--r-- 1 xy 212397 Jun 4 09:11 /tmp/foo.RData
Now with data that repeats:
R> X <- rep(1,1e5) # data, lots of repeats R> write.csv(X, file="/tmp/bar.csv") R> save(X, file="/tmp/bar.RData") R> system("ls -lh /tmp/bar*") -rw-r--r-- 1 xy 966K Jun 4 09:12 /tmp/bar.csv -rw-r--r-- 1 xy 1.3K Jun 4 09:12 /tmp/bar.RData R>
Thus, we get coefficients from 6.5 to 743 depending on how well it compresses. And this is before we make csv more "expensive" by forcing to print a few decimal places ...
source share