Search for errors by deactivating (searching) the history of changes and unverified commits (revisions)

Most modern version control tools have a command to find the change that introduced the error when binary searching (dividing in half) the story. Such a command may be inlined or may be provided as an extension or plugin. Examples include git-bisect in Git, hg bisect "in Mercurial (formerly available as hbisect ), and the bzr-bisect plugin for Bazaar.

The challenge is to do this in an automatic or semi-automatic way, even in the presence of a non-linear history (branch points and merges). The goal is to find a “bad” revision in as few steps as possible, or in more detail, to find a commit for verification that, if possible, divides the commit graph into a test (DAG commits) in half. This problem is being solved, I think, very well.

But there is a problem with untestable commits , for example. if for some code the revision does not even compile, or if it compiles, it does not start / starts (or does not detect an error that is not related to the one you are looking for). This means that instead of just marking the commit as “good” or “bad,” you now have three possible states:

  • good - no error
  • bad - error behavior
  • untestable - it is not known whether an error is present.

Some version control systems (SCMs) allow you to skip commits, usually moving to the parent version, such as the ones you need to check next.


Questions:

  • If you are faced with this situation, that is, you used halving and came across unverifiable versions, what in your experience distributes such unverifiable commits? Do they occur in isolation (one unverifiable commit), or do they appear in ranges (versions a..b are not subject to testing)? Are you in a situation where you had to skip commit after commit?

  • Is there some kind of mathematical model (for example, for simple deaeration of a list / linear history and even for halving an arbitrary DAG of revisions) or an algorithm (possibly heuristic) that allows optimizing the omissions of unverified commits. The goal is to reduce again (on average) the number of versions to testing if there are unverified commits (or an unrelated error).

  • Do you use a version control system or some add-on / extension / plugin for a version control system or any third-party tool that implements such an algorithm, in addition to allowing you to simply skip unverified commits, editing neighbors? What is this VCS or tool? What algorithm does it use (if you know it)?

Hope this leads to even easier (semi) automatic error searches ...


Added 06-06-2009:
When using advanced Git functions, there is one situation where you can have a whole branch of unchecked commits (or at least difficult to verify), namely, when you use the "subtree" merge to combine the histories of two separate projects (for example, the full Linux kernel with some drivers, it is developed separately using the “subtree" merge. This should be taken into account when developing an algorithm for processing unverified commits: with a nonlinear history, there may be a whole branch of unverified commits, and the algorithm should consider topology (several).

+4
source share
3 answers

What is verified obviously cannot help. The large codebases I worked on required that all checks be really built. This was done if the developers sent their changes to the registration server, which will have a queue of changes waiting to be entered. Each change is created for all target platforms in the order in which it is submitted. If the assembly fails, registration will be rejected. If this succeeds, a set of automatic regression / unit tests is performed. If any test fails, registration will be rejected. If this succeeds, registration is performed in the repository.

If you have such a system in place, it significantly reduces the number of unbuilt / unverified versions. Bad builds are limited to warehouse administrators who do crazy things outside the registration server.

In environments where there is no such option, I do not have a rigorous statistical analysis, but I found that in my pockets there are anecdotally uncontrolled revisions. One registration will spoil a lot of things, and then there is a series of small bookmarks trying to fix the mess. Then everything is fine.

+3
source

You can redefine the individual elements of your bisection / binary search algorithm as revision ranges with adjacent “unknown” states, rather than single versions.

In other words, if you discover an unknown state during your binary search, you start searching for sub-queries back and forth to find the boundaries of the revision range that give specific answers for you. It will probably be a linear search in both directions, so it will be a little slower, you would have to assume that most versions are not inconvenient.

Ultimately, this will lead to the release of a number of corrections where an error has appeared, for example, somewhere between the version (58.63). Then you will have to manually search for this range.

0
source

Just an idea: I know that one of the other problems you are thinking about is noise testing. One way to think about skipping is to treat them as a random good / bad response message and use a bisection algorithm that is robust against such errors, for example: http://www.disp.uniroma2.it/users/grandoni/ FGItcs.pdf "Optimal fail-safe sorting and searching for memory errors." I. Finnokki, F. Grandoni and G.F. Italiano.

The effect of applying this algorithm to git-bisect will be to separate it from the checkpoints and restart the search when it is detected that the wrong branch has been executed. Unlike the above, you know which points were unreliable, so you could just back off.

0
source

Source: https://habr.com/ru/post/1285842/


All Articles