I am using Django 1.9. I have a Django table that represents the value of a specific measure, by month, with the original values ββand percentiles:
class MeasureValue(models.Model): org = models.ForeignKey(Org, null=True, blank=True) month = models.DateField() calc_value = models.FloatField(null=True, blank=True) percentile = models.FloatField(null=True, blank=True)
Typically, 10,000 people a month. My question is whether I can speed up the process of setting values ββon models.
I am currently calculating percentiles, getting all measured values ββfor a month, using the Django filter query, converting it to the pandas framework, and then using scipy rankdata to set ranks and percentiles. I do this because pandas and rankdata efficient, able to ignore null values, and able to handle duplicate values ββthe way I want, so I'm happy with this method:
records = MeasureValue.objects.filter(month=month).values() df = pd.DataFrame.from_records(records) // use calc_value to set percentile on each row, using scipy rankdata
However, I then need to get each percentile value from the data frame and set it again on model instances. Now I am doing this by iterating over the data rows and updating each instance:
for i, row in df.iterrows(): mv = MeasureValue.objects.get(org=row.org, month=month) if (row.percentile is None) or np.isnan(row.percentile): row.percentile = None mv.percentile = row.percentile mv.save()
This is not surprisingly rather slow. Is there an effective way for Django to speed it up by creating a single database, rather than tens of thousands? I checked the documentation , but can't see it.