I deal with biology, in particular DNA, and often there is a problem with the size of the data that comes from the genome sequence.
For those of you who have no experience in biology, I will give a brief overview of the DNA sequence. DNA consists of four letters: A, T, G, and C, whose specific order determines what happens in the cell.
The main problem with DNA sequencing technology is the size of the data that produces the results (for the whole genome, often much more than gigabytes).
I know that the size of an int in C varies from computer to computer, but it has even more options for storing information than four options. Is there a way to determine the type / method of determining the "base", which takes only 2 or 3 bits? I was looking for a definition of structure, but I am afraid that this is not what I am looking for. Thank you
Also, will this work better in other languages (maybe at a higher level like java)?
source
share