Big 66GiB old CVS repo - is it a good idea to upgrade to GIT or Mercurial?

We have a very large legacy of CVS repo (66GiB) for a decade and increasing. Now we have some subcontracting companies that need to work with some modules and branches.

We need to create several branches for them and send them to the branches. We also need to integrate our changes into our core industries from time to time.

Our concern:

  • we cannot give them a complete repo absolutely, basically the problem is security.

  • we need to send them some history, not just the version of the โ€œHEADโ€ code.

  • we are still doing some development, so we need to send them a set of changes from time to time.

Is GIT and Mercurial a good choice to migrate from CVS? Can GIT / Mercurial satisfy our needs?

EDIT: I think we really need centralized version control with a multi-site function, with the ability to create a repository outside the site, based on part of the central repo. And it can be easy to merge between sites.

+4
source share
5 answers

With Git, you can use the git subtree to โ€œcutโ€ the subdirectories you can provide to your subcontractors, and then easily reintegrate their changes into your backbone. You can also update them periodically if necessary. The git subtree was the original add-in, but was moved to the contrib directory of the official Git distribution.

You can limit the amount of history that you include in the repository that you provide to an external user.

I expect your biggest concern will be with switching to DVCS with such a great starting repo. Git compresses your repo, so when you are done it is unlikely to be 66 GB, but it will still be rather cumbersome (probably around 10 GB, depending on what you store there). If you do not think that is the problem, then go for it.

I limited my answers to Git because I am more familiar with Git than Mercurial.

+5
source

66 GB sounds like a lot. However, CVS is not known to store data very efficiently. Git will certainly work for you, but you will have to split the project into several small git repositories. For most projects, it is not very difficult to divide the functionality into several autonomous subprojects (often these are subdirectories). As a rule, you want to limit the size of any given git storage to an average of less than 1-2 GB, and, of course, it should not exceed 5-10 GB. However, keep in mind that git is exceptionally good at compressing its metadata (as long as you run git gc once in a while).

Now, once you have divided the project into several subprojects (โ€œseveralโ€ is a relative term, Android is 300+), you need to figure out a way to โ€œglueโ€ them together into a connected directory structure again.

There are two general approaches for this:

  • Using repo tool developed by Android project. This is due to the creation of a small git repository containing only one XML file (called the manifest), which tracks where your subprojects are checked and how they are glued together. This works very well on Linux and Mac, but unfortunately does not support Windows ( repo requires OS symlink support).
  • Using git submodule . Create one git repository without any real files and add all the original subprojects to this repository as submodules. In a way, this super git repo plays essentially the same role as the Android repo demo. The advantage of this approach is that it is supported by any OS, including Windows.

Now, if you want to share only small portions of your giant project, you can do this by providing any submodule / subproject directly to your partners as a standard git repository.

In fact, to make it more convenient, I highly recommend installing Gerrit - git server-side implementation in Java, which is also an extremely powerful code review engine (also used by the Android project). The Gerrit code viewer function is completely optional (you do not need to use it if you do not want it), but you will certainly enjoy unified user authentication, ssh key management and the ability to control access permissions in the git repository. This is very convenient for third parties - you just give them access to small parts with Gerrit, and you're done.

+3
source

Choose git. prefer submodules over trees if you can, as you can better control the dependencies between projects and their respective subprojects.

0
source

We have a very large legacy of CVS repo (66GiB) for a decade and increasing. Now we have some subcontracting companies that need to work with some modules and branches.

We need to create several branches for them and send them to the branches. We also need to integrate our changes into our core industries from time to time.

It looks like you want to switch only to subcontractors and not to everyone else. I highly recommend you not to do this. Either convert all, or do not convert anyone. Starting a mixed system is a huge pain, especially when it comes to changing people on DVCS.

Our concern:

  • we cannot give them a complete repo absolutely, basically the problem is security.

Do you have several modules in your CVS repository, but you canโ€™t provide them with all the modules, or do you want to restrict access to a history that they can access?

DVCS works much better when modules are stored as separate repositories, rather than multiple modules in the same repository *. There are many reasons for this, but basically it is so that changes in different modules do not cause unnecessary merges.

(* Like CVCS, but usually this kind of pain creates a new module that people only do once. I suspect you would not have 66 GB if it was shared.)

So, if you are converting, you want to split the modules. This will allow you to share some modules, not others. I know that Mercurial can create a repo from the path set in the multi-module repo during conversion. I expect Git to have similar capabilities.

  • we need to send them some history, not just the "HEAD" version of the code.

It almost dictates DVCS. This is a defining attribute.

  • we are still doing some development, so we need to send them a set of changes from time to time.

... and therefore you must use the same VC tool as they are. Otherwise, you will spend all your time converting sets of changes between systems.

Is Git and Mercurial a good choice to migrate from CVS? Can GIT / Mercurial satisfy our needs?

Yes and Yes, but this is not a switch to a button. It requires planning, commitment and education.

EDIT: I think we really need centralized version control with a multi-site function, with the ability to create a repository off-site based on part of a central repo. And it can be easy to merge between sites.

Centralized but distributed version control system. Got ya!

Endpoint, do not confuse centralized / distributed development practices with centralized / distributed tools. It is wise to work in a centralized development model with distributed VCS.

0
source

I will allow other posters to answer the subtree and questions related to prehistory, because I am not so familiar with this. However, I can tell you a few things about the size of the repo. Firstly, your git repository is likely to be much smaller than your CVS (I would suggest that it will be between a tenth and a half of the current 66GiB).

Secondly, yes, if you put all CVS repositories in one git repo repository, your internal developers will have a copy of the entire repo on their personal computers. The git repo I work with daily is 12 GB and this does not cause any real problems. Assuming your repo is large because your working copy is large, it actually saves a significant amount of time when you want to move between branches because you do not collect as many files across the network. For us, the 12GB git repo is not a big deal, because my current working copy (with build objects for most purposes) is an extra 37 GB on top of the git repository itself. In a repository of this size, git commands are much faster than a disruptive operation.

So definitely read what everyone else is saying about subtrees and modules, etc., but be sure that you can just import it all if you need to.

0
source

Source: https://habr.com/ru/post/1443555/


All Articles