Well, if you need a result (genotype) for each object for each object, then the standard many-to-many mediation table (genotype) will be really huge. With 1000 subjects, you will have 500 million records.
If you could store the values ββfor the genotype encoded / serialized field in one or more columns, this would significantly reduce the number of records. The problem of saving 500k encoded in one column will be a problem, but if you can separate them into groups, it should be workable. This will reduce the number of entries to nr. Subjects. Or another possibility could be related to the ProbeGroup-s probe and have nr. ProbeResults = nr. Theme * nr. ProbeGroup. The first option would be something like this:
class SubjectProbeResults(models.Model): subject = models.ForeignKey(Subject, related_name='probe_results') pg_a_genotypes = models.TextField() .. pg_n_genotypes = models.TextField()
This, of course, makes it difficult to search / filter the results, but should not be too hard if the saved format is simple. You can have the following format in genotype columns: "probe1_id | genotype1, probe2_id | genotype2, probe3_id | genotype3, ..."
Get a sample of objects for a specific probe of genotype +.
a. Determine which group the ie Group C probe belongs to β pg_c_genotypes
b. Request the appropriate column for the probe_id + genotype combination.
from django.db.models import Q qstring = "%s|%s" % (probe_id, genotype) subjects = Subject.objects.filter(Q(probe_results__pg_c_genotypes__contains=',%s,' % qstring) | \ Q(probe_results__pg_c_genotypes__startswith='%s,' % qstring) | \ Q(probe_results__pg_c_genotypes__endswith=',%s' % qstring))
Another option that I mentioned is to have a ProbeGroup model, and each Probe will have a ForeignKey value for the ProbeGroup . And then:
class SubjectProbeResults(models.Model): subject = models.ForeignKey(Subject, related_name='probe_results') probe_group = models.ForeignKey(ProbeGroup, related_name='probe_results') genotypes = models.TextField()
You can query the genotype field the same way, but now you can query the group directly, rather than specifying the column to look for. Thus, if you have ex. 1000 probes in a group β 500 groups. Then for 1000 items you will have 500K SubjectProbeResults , still a lot, but, of course, more manageable than 500M. But you may have fewer groups, you will need to check what works best.