Potential for missing version in Subversion (exploration setup)

My first question is here, and I have not seen it elsewhere. I work at a research institute, so we would like to say which version of the code gives a certain set of results. My question is the correctness of my analysis, as shown in the figure below (note that the node id version is for illustration only and does not match the actual version identifier in SVN, Git or Hg. Version numbers with letters represent the uncommitted state of the code in SVN, identifier the integer version represents the fixed state in SVN, all version identifiers in the Git / Hg field represent the state of the fixed code):

Disadvantage of using SVN for research projects?

Example script:

  • Suppose there are two working copies of "A" and "B" that start with version 1.

  • "A" changes the default values ​​in the foo() function, generates the results, and checks the version (repo ver2).

  • "B" does not revise foo() , but some other part of the code uses the old defaults to generate the results and tries to check as being used version 1b. This is due to the need to upgrade, but in the process of merging versions 2 and 1b, SVN loses the fact that version 1b uses different default values ​​in foo() . This is not defined as a conflict, since "A" and "B" did not change the same part of the code. Version 3 is not identical to version 1b, so replicability is not guaranteed.

I cannot simulate this script on my local drive using TortoiseSVN (I cannot create working copies due to the SVN Checkout error - "Unable to open ra_local session to URL"). I really know that both Git and Hg handle the situation correctly and show version 1b in the history if it was fixed and if the redirection function was not used. (I believe that rebase is essentially normal behavior in SVN when branches are not involved.)

Is this analysis correct?

+4
source share
2 answers

Your analysis is, in principle, correct, although I would object to naming "version 1b". Version 1b never exists in the SVN realm, because 1b is the state of the working directory before committing.

Your workflow has a fundamental problem: if you need reliable identification of the results, you will first need to get an identifier, and then produce the results. Finish, then generate. If this gives problems with reliability, register in branches, generate results and merge. The branch and merge approach is similar to the method of distributing common VCS software, such as git or hg, where local repositories are implicit branches and push is an implicit merge.

+2
source

Yes, you are correct that Subversion has this problem. Infact, it is even worse than you think. Subversion works on a per-file basis when it determines if your working copy is out of date. So you can get

  • A changes the default values ​​in foo() and repeats the experiment. Let them say that the change only affects results/output-0001.dat .

  • A does this as a version of SVN 2.

  • B revises another piece of code and generates new results. Since B has no change from A, only the change in results/output-1000.dat changed by repetition.

  • B does this as a version of SVN 3.

B can commit without updating because the changes he made did not overlap with the changes made by A. Moreover, SVN version 3 does not match the working copy on machine A or B! If Professor C comes in and does a version 3 check of the SVN, he sees:

  • results/output-0001.dat with results from A and
  • results/output-1000.dat with the results from B.

This is very controversial.

The main concept that allows this is mixed editorial working copies . Subversion allows you to have files in different versions of your working copy. When you create revision 2 with the change to foo.c , this file is marked as revision 2. Other files in the working copy remain in version 1. This allows you to selectively upgrade part of your working copy to the old version for debugging purposes, and allows you to commit files without updates if no one has touched the file.

Tools such as Mercurial and Git will stop you from doing this because they model history as a DAG (directed acyclic graph). Each change becomes a new node in the graph, and you must make the merge commit explicit to merge the two sets of changes. In the above scenario, B will try to push it to change, and Mercurial will abort. Then he does

 $ hg pull $ hg merge # he now has both his own and A changes to the code $ run-experiment $ hg commit -m "Merge with new results" 

All three versions of the results are now stored in history.

+2
source

Source: https://habr.com/ru/post/1390693/


All Articles