Reliable integration test for code using HyperLogLog?

We use the Twitter implementation of HyperLogLog in Algebird. Given the number N and the check in our system, which uses HyperLogLog to estimate the current size of a gradually growing collection and test, if it is greater or less than N, how can we write an integration or system test that checks this check and is almost guaranteed to pass if our Is the code that calls HyperLogLog correct? System testing is non-deterministic, because, on the one hand, it is multithreaded.

My first thought was that the right way to write an integration test that is reliable for this use case is to "abandon our standards." So, what is enough elements (M) to send to the endpoint to make sure that HyperLogLog will evaluate the total number of elements as greater than N, with a probability of, say,> = 0.999999?

Or is there a better approach?

The standard error limits are configurable, but this does not directly tell us about the maximum error limits that we could see for a while - this is what I care about in order to avoid accidental failures in building the CI on the host, time and hair extension!

I am also concerned that the method for generating random data in tests may not generate evenly distributed random data in relevant aspects, which could significantly affect probability calculations.

+5
source share
1 answer

Let me break it down a bit. There are two main types of behavior you want to test:

  • The HyperLogLog Twitter implementation is working correctly, i.e. gives a good estimate of the number of elements.

  • Your HyperLogLog structures consuming your code (for example, counters) increase them if necessary.

Please note that behavior # 2 is easy to verify during assembly using unit tests, not integration tests. This is preferable and will catch most of the problems.

Case No. 1 can also be divided into three cases:

A when the number of elements is 0;

B, when the number of elements is small (5, 100 or 1000);

C when the number of elements is large (millions / billions).

Again, cases A and B can and should be tested during assembly using unit tests. You have to decide on the acceptable margins of error depending on your application and state that the grades are within these grades - it doesn’t really matter that you have chosen HyperLogLog as the base grading method, the tests should treat the grading as a black box. I would say that a 10% error is reasonable for most purposes, but it really depends on your specific application. These limitations should represent the worst possible accuracy with which your application can live. For example, the critical error counter may not live with ANY rating errors at all, so using HyperLogLog for it should break the unit test. A counter for the number of individual users may be able to live with a 50% rating error - it is up to you.

So, this leaves us with the last case - testing that the HyperLogLog implementation gives a good estimate for a large number of elements. This cannot be verified during assembly, and indeed, an integration test is the way to go. However, depending on how much you trust the HyperLogLog implementation on Twitter, you might consider DO NOT TEST it at all - Twitter should have done it already. This may seem like a break in best practice, but given the overhead that may be associated with the integration test, it might be worth it in your case.

If you decide to write an integration test, you will need to simulate the traffic that you expect during the production process and generate it from several sources, since you will generate millions / billions of requests. You can save a sample of real production traffic and use it for testing (perhaps the most accurate method) or determine how your traffic looks and create similar test traffic. Again, the margin of error should be chosen according to the application, and you should be able to exchange the assessment method for the best, without breaking the test.

+2
source

Source: https://habr.com/ru/post/1259194/


All Articles