Most of the answers, with the exception of slayton, rauchen, Paul Amstrong users, are completely incorrect when it comes to clean one-on-one storage without compression methods.
The human genome with 3Gb nucleotides corresponds to 3Gb bytes, and not ~ 750MB. The constructed haploid genome according to the NCBI currently has a size of 3436687 KB or 3.436687 GB. Check here for yourself.
Haploid = single copy of the chromosome. Diploid = two versions of the haploid. People have 22 unique chromosomes x 2 = 44. The male 23rd chromosome is X, Y and is 46 in total. Females of the 23rd chrome. is X, X and thus is 46 in total.
For men, this will be the 23 + 1 chromosome when storing data on the hard drive, and for women, it will be 23 chromosomes, which explains the small differences mentioned from time to time in the answers. X chrome. of men is equal to X chrome. from women.
Thus, the loading of the genome (23 + 1) into memory is carried out in parts via BLAST using the created databases from fasta files. Regardless of the version with zippers or not, nucleotides are unlikely to be compressed. In the early days, one of the tricks was replacing tandem repeats (GACGACGAC with shorter encoding, for example, “3GAC”; from 9 bytes to 4 bytes). The reason was to save hard disk space (500bm-2GB hard drive plate area with 7,200 rpm and SCSI connectors). To search for a sequence, this was also done with a query.
If the storage of the “encoded nucleotide” is 2-bit in letter, then you get per byte:
A = 00
C = 01
G = 10
T = 11
This is the only way you can fully profit from the positions 1,2,3,4,5,6,7 and 8 for 1 byte of encoding. For example, the combination 01.01.10.11 corresponding to "ACTG". This one is responsible for reducing the file size by 4 times, as we see in the other answers. Thus, the size of 3.4 GB will be reduced to 0.85917175 GB ... ~ 860 MB, including the required conversion program (23 KB-4 MB).
But ... in biology, you want to read something, so gzipped compression is more than enough. Unzipped you can still read. If this byte pad was used, it becomes more difficult to read the data. This is why fasta files are actually text files.