I am creating a very large data file with Python, basically consisting of 0(false) and just a few 1(true). It has about 700,000 columns and 15,000 rows, and therefore a size of 10.5 GB. The first line is the title.
Then this file should be read and visualized in R.
I am looking for a suitable data format for exporting my file from Python.
As indicated here :
HDF5 is row based. You get MUCH efficiency by having tables that are not too wide but quite long.
Since I have a very wide table, I suppose HDF5 is inappropriate in my case?
So which data format is suitable for this purpose?
Would it also be wise to zip it?
An example of my file:
id,col1,col2,col3,col4,col5,...
1,0,0,0,1,0,...
2,1,0,0,0,1,...
3,0,1,0,0,1,...
4,...