This is not so much a comparison of Git and Mercurial as a recounting of my experience looking at the two. If you are looking to make a decision between them, I can tell you two important things:
- All of my research indicates that when it comes to the technical capabilities of the two systems, they are so similar as to be almost indistinguishable. They both do the job and in almost exactly the same fashion. One will have a slight advantage over the other in any number of ways, but when it boils down, you won’t find anything compelling about one’s capabilities over the other. And they are growing more similar over time as each implements the ideas of the other (at least, this is definitely the case with Mercurial borrowing features such as rebasing from git).
- That said, the environment in which you are operating makes a huge difference. The punchline is that, if you want to deploy a repository server and you run a windows-focused environment, then you have no choice but Mercurial (at the time of this writing).
In this article, I’ll cover:
- Background and motivation for looking at distributed version control systems
In further articles I’ll cover:
- Useful links for understanding and comparing the systems
- Evaluating the capabilities of the systems
- Working with Git on Windows
- Experience with Mercurial
I’ll be writing a short series with more detail on my deeper experiences with Mercurial as well.
Motivation
The biggest motivation for looking at the so-called “next generation” of version control systems, the distributed version control systems or DVCSs, is that of the merge.
At my place of employment we currently use Subversion for version control purposes. It’s a great system and one we’ve used for a long time, and I’ve personally used it even longer. However, it works best when you don’t do a lot of branching. Branches are treated as second-class citizens, a special case to be given special consideration only when actually attempting to merge results from the branch back into the trunk. At that point, our experiences have indicated two things (even with the new merge tools introduced with Subversion 1.5):
- Merges rarely go off without a hitch.
- Merges make you go through a battery of options which require a thorough knowledge of the merge process.
The upshot of this is that it discourages use of branching. Our head developer wants to worry about coding rather than the details of version control, and I don’t blame him. A good rule for Subversion is to never let your branches drift far from your trunk, which requires frequent merging of trunk changes into your branches. Because merges frequently encounter conflicts in Subversion (some new types of conflict were introduced with 1.5 as well which I still don’t understand and aren’t well documented), it’s just less of a headache to work on the trunk and leave branches out of the picture except for major feature-development. Even then, it’s a pain.
Of course, working on just the trunk has its own issues, which version control should be suited to fixing, but can’t handle well without branching. The biggest of these issues is that of parallel works-in-progress. We are constantly working on issues with the code, resolving them (hopefully), then testing and iterating changes and fixes back into the code. While these feedback cycles occur, the code is in a work-in-progress state, however, we can’t serialize the process.
Having our developers sit around twiddling their thumbs while the code is tested and vetted simply isn’t an option. So they work on other issues while the feedback is generated from earlier work. If you are only working on the trunk, this means there is always half-baked code in there. When the time comes to cut further development and release, sometimes code isn’t fully baked but is too difficult to track and back out. Thus bugs in the field are born.
So there is the appeal of distributed version control systems. Branching allows you to track individual changes in their own branch and “graduate” them to the trunk after they have been vetted. Not only that, if you want to test a fix in isolation from other trunk changes, the branch allows you to do this by implementing the fix on top of just the last released version of the code, without any interim work (vetted or not) to gum up the works. While this kind of targeted development is sometimes more effort than it is worth (it means you have to track a lot more interim versions of the build and which particular problems they solve), it can be worthwhile a lot of the time. If branches are low-cost (at merge-time as well as branch-time), then this model becomes possible.
That being the goal, let me say that that’s a theory for us. Others whose posts I’ve read say the model works and the appeal is enough for us to investigate, but I personally haven’t seen this in practice yet.
In both Git and Mercurial (as well as the other contenders), branching is not only simple, it’s the only means of development.
Let me step back from that statement and explain what I mean. In a DVCS, there is the concept of the repository and the working copy, just like in Subversion. The repository contains the history of changes, and the working copy is the current state of the files, whether that is from the last revision in the repository or some other point, with or without your own local modifications. The system tracks your changes to the working copy, allows you to go back to other points-in-time and generally handles change management on those files.
In Subversion, the repository is usually centralized somewhere on the network and is not only authoritative, it is the only repository for that code. Sure, you can check in the same files somewhere else, but there is no system for keeping that repository in sync with the central repository, unless you look at distributed enhancements to Subversion, which I have not.
In a DVCS, there can be and always is more than one repository as soon as more than one developer is involved (and usually even before that). The difference is that the working copies are almost inevitably tightly-coupled with a copy of the repository. This means that the repository is local to your disk and resides alongside the working copy. The upshot of this is that your working copy/repository is usually larger than the simple working copy of the centralized VCS, but in trade, repository operations are usually far faster. More on the impact of this distinction later in my series.
The way this works is that there are mechanisms to keep your repository in sync with other repositories by pushing and/or pulling changes from one repository to another. A typical workflow might be to have an “authoritative” or “build” repository that lives remotely, from which the developers copy theirs. While the developers work, they can commit their changes to their local repository. A commit does not automatically synchronize to other repositories. Then, as a piece of work concludes, the work can be tested locally, approved, then pushed to the central repository from which other developers can then receive the changes by pulling.
Getting back to my point about branching being the regular mode of operation in this model, each repository in this case can be viewed as its own branch. The reason I say this is because each copy of the repository is not required to synchronize after a change is committed. In Subversion, once a commit is made, no other changes can be committed by other developers without them first updating their working copy and resolving any conflicts that arise.
This means that, in Subversion, for a given directory path there is only ever one line of commits (barring discussion about explicit branches, which are different directory paths). Each subsequent commit contains all of the changes from every commit prior to it.
In a DVCS, concurrent changes can be committed to the various developers’ separate repositories, and no communication happens between them unless the developers initiate it manually. This means that there are necessarily divergent lines of development which do not contain the same set of changes, even while each copy of the repository only sees its own straight line of development.
These unsynchronized copies of the repository are therefore each their own branch of development, which may continue unabated without having to be synchronized with the other branches. In fact, nothing says that they need be synchronized ever again, although in practice they will usually be synchronized fairly frequently.
This kind of branching is called “anonymous branching” in Mercurial because each branch exists without any explicit name to mark it as such. Two developers may separately develop branches that both think they are the “default” line of development.
At some point these changes will usually be synchronized. When that happens, there is only one thing that the system can do…it must track two (or more) separate lines of changes that are not merged. This is where the concept of “heads” comes in. Each repository must store the branches of development which result in a different state of what is meant to be the same set of files at the ends of each branch. Each branch then has its own head, the final state of files in that branch.
As a hypothetical aside, the system could force one of the developers to accept the others changes only into their working copy, resolve any conflicts, then update their single line of development with the merge. By keeping the other changes out of the repository until they are merged, this could keep a single line of development in the repository, much like Subversion. Concurrent development could continue without synchronization of this sort, but once synchronization occurred, there would be no branches.
However, this would be less flexible than what the DVCS systems actually allow. By tracking separate branches with their own heads, synchronization can take place and separate lines of development can be shared between developers without requiring any sort of merge at synchronization time. This means branched work can be shared between developers without requiring the work to be integrated. Developers can collaboratively work on different development lines by synchronizing repositories and switching their working copy between the branched lines.
While there are other branch management tools in both systems, this fundamental form of branching is so intrinsic to git and Mercurial that they need to handle merging well by necessity because it happens frequently.
While this is a requirement for such systems, it is not an explanation of why merging is better/faster/easier/some combination than Subversions. That’s a matter of implementation. In practice, it is better. Why? I have no idea. I haven’t played with it enough to know the differences and I don’t understand it well enough in Subversion in the first place to offer a rationale. I’m sure that a large part of it is that these systems track file moves and renames explicitly and better than Subversion does. Otherwise, I’m at a loss. But it works, it really does.
Objectively, merging just works most of the time. There is nothing to prevent you from introducing logical inconsistencies into your code by merging, and detection of actual file conflicts doesn’t prevent that. However, in practice, that’s what testing (such as unit testing) is for, and most of the time the code works if your developers are working on logically separate portions of your code. There are fewer merge options, which lowers the required knowledge and incidence of operator error. Merges tend to go off without a hitch.
Of course, merges have to be done explicitly and then committed and pushed/pulled to other repositories, so there is a bit more elbow-grease to be applied in that respect, but it remains simpler than merging in Subversion by a great degree.
So, to recap, the motivation for us to look at these systems is to enable collaborative, multi-branch lines of development. This will hopefully allow us to target fixes and features in a more fine-grained manner, with fewer interaction effects of work-in-progress. It will also give us more control over released code, as branches must graduate to the main line of development. We’ll see how these advantages pan out in practice as we get more comfortable with them.
Next time I’ll cover useful links for understanding and comparing git and Mercurial.