I am trying to estimate the total volume of results for queries from the application engine that will return large volumes of results.
To do this, I assigned a random floating-point number between 0 and 1 for each object. Then I completed a query for which I wanted to evaluate the overall results with the following three settings:
* I ordered by the random numbers that I had assigned in ascending order
* I set the offset to 1000
* I fetched only one entity
Then I connected the entities to a random value that I assigned for this purpose, in the following equation to evaluate the overall results (since I used 1000 as the offset above, in this case the OFFSET value will be 1000):
1 / RANDOM * OFFSET
The idea is that since each object is assigned a random number, and I sort by this random number, the assignment of random numbers to the entity should be proportional to the beginning and end of the results relative to its offset (in this case, 1000).
The problem I am facing is that the results that I get give me low marks. And lower scores, the smaller the bias. I expected that the lower the bias I used, the less accurate the estimate, but I thought the error would be higher and lower than the actual number of results.
Below is a chart demonstrating what I'm talking about. As you can see, the predictions become more consistent (accurate) as the bias increases from 1000 to 5000. But then the predictions predictably follow the 4-part polynomial. (y = -5E-15x4 + 7E-10x3 - 3E-05x2 + 0.3781x + 51608).
Am I making a mistake here, or is the standard python random number generator not distributing the numbers evenly enough for this purpose?
Thanks!

Edit:
It turns out this problem is due to my mistake. In another part of the program, I grabbed objects from the beginning of the series, performed an operation, and then reassigned a random number. This led to a tighter distribution of random numbers towards the end.
I went a little deeper into this concept, fixed the problem and tried it again on a different request (therefore, the number of results differs from the previous one). I found that this idea can be used to evaluate the overall results for a query. It should be noted that the βerrorβ is very similar to offsets close to each other. When I made the scatter chart in excel, I expected that the forecast accuracy at each offset would be cloudy. This means that offsets at the very begging will lead to the appearance of a larger, less dense cloud, which converges to a very small, dense, around the actual value, since the offsets become larger. This is not what happened, as you can see below in the cart about how far the predictions were at each offset. Where I thought there would be a point cloud, there is a line instead.

This is a graph of the maximum after each offset. For example, the maximum error for any offset after 10,000 was less than 1%:
