CS 537 Notes, Revision Control Systems
1 What are Revision Control Systems, and why are
they critical?
A revision control system (also known as a version control system,
source control system, or source management system) is a system for
storing multiple versions of a file or collection of files. For
example, bank statements are a simple form of revision control - they
specify the state of your bank account after each activity.
Let's say you've been working on a software project for a couple of
months. You keep backups of your code because it's good
practice. Eventually, you end up with this:
-rw-r--r-- 1 bernat bernat 8172 Jan 19 16:53 simulation.c
-rw-r--r-- 1 bernat bernat 8172 Jan 19 16:53 #simulation.c#
-rw-r--r-- 1 bernat bernat 8156 Jan 19 14:38 simulation.c~
-rw-r--r-- 1 bernat bernat 9312 Jan 17 12:01 simulation.c.old
-rw-r--r-- 1 bernat bernat 9320 Dec 21 2010 simulation.c.bak
-rw-r--r-- 1 bernat bernat 8905 Apr 16 2010 simulation.c.orig
-rw-r--r-- 1 bernat bernat 8678 Dec 13 2010 simulation.c.from-UMD
Or this:
-rw-r--r-- 1 bernat bernat 8172 Jan 19 16:53 simulation.c
-rw-r--r-- 1 bernat bernat 9312 Jan 17 12:01 simulation.c.2011.01.17
-rw-r--r-- 1 bernat bernat 9320 Dec 21 2010 simulation.c.2010.12.21
...
This is difficult to deal with, and error prone. Instead, what if you
could do this?
brie(32)% emacs simulation.c
brie(33)% cvs commit simulation.c
CVS commit message: Added water simulation capability
CVS: committed version 18
brie(34)%
Where the cvs command handled all of the tracking necessary
to keep previous versions. With this, I can say:
brie(40): cvs co simulation.c
brie(41): cvs co -r 15 simulation.c
brie(42): cvs co -D 20071231
The first command gets me the latest version of simulation.c, the
second gets me version 15, and the third gets me the version from the
end of last December.
2 Advantages of an RCS
What do RCSes give us? Many things.
- Easy to use backups.
- Distributed editing with centralized source code.
Let's go into that in more detail. The first is obvious: the original
purpose of an RCS was to allow one person to keep backups of important
files quickly and easily. But it gets better. The same system can also
allow multiple people to edit the same file without stepping on each
other's toes, or to allow one person to edit from multiple locations.
Imagine that Alice and Bob are both editing simulation.c. Without an
RCS, it is easy for changes to overlap (note: this is another good
reason for multiple source files!). The conversation goes something
like this:
Alice: "Bob, I have a new version of the file."
Bob: "But Alice, I made some changes as well. Now I need to integrate
them."
With a RCS, it goes something like this:
Alice: cvs commit simulation.c <-- makes Alice's changes
Bob: cvs update simulation.c <-- integrates Alice's changes
Bob: cvs commit simulation.c
Alice: cvs update simulation.c
As you can see, the RCS acts as a mediator between the programmers. It
can identify when updates will conflict, and labels them so they're
easily noticed.
Let's summarize. An RCS can:
- Store multiple revisions of a file with simple commands.
- Store comments on these files for future use.
- Retrieve previous versions by revision number or date.
- Resolve conflicts between developers
And, in addition, can:
- Display the differences between revisions
- Display who made a change ("cvs blame")
- Manage multiple lines ("branches") of development
3 Current RCSes
There are many revision control systems; we will describe four of
them.
- RCS (Revision Control System)
- Open source project
- Very mature code, and widely available.
- Strict locking to prevent conflicts - one editor at a time.
- Good for individual files and small projects, but outdated in
general.
- CVS (Concurrent Versions System)
- A layer on top of RCS (originally)
- Uses a "modify and merge" model which can lead to conflicts,
but is more efficient than RCS' locking model.
- Handles multiple files in directories, but each file is
independent - each file has its own revision history.
- A single CVS server can handle multiple clients.
- Found absolutely everywhere
- Subversion (SVN)
- A rewrite of CVS, using similar commands
- Tracks directories and multi-file commits - so revisions are of
the entire project.
- More efficient
- Handles branches much better than CVS
- GIT
- A distributed VCS
- Many similarities in capabilities to SVN
- (Editorial) Bleeding edge and (often) hard to understand
4 Usage
We will discuss CVS and SVN since the CSL supports them directly. First, take a look at the CSL documentation:
SVN @ CSL , and
CVS @ CSL
The basic concepts of a version control system are a repository, a current working copy, checking out, updating, and committing. In order:
- A repository is a central location where all your code resides. It's frequently on a server but can be on a local machine. It tracks all changes that have been made to that code. Most importantly, you do not edit the repository directly - like a bank repository, you only access it through an interface. In this case, the version control system.
- A current working copy is just that - a version of your code that you "checked out" and are editing. If all goes well, you will commit those changes back to the repository. Or you can throw your changes away entirely and start over with any previous version of the code.
- A check out creates a current working copy from a repository. You can have as many working copies as you want!
- An update takes whatever changes have been made to the repository - from other commits, for instance - since you made your check out and applies them to your current working copy.
- A commit updates the repository from the working copy where you start the commit.
That covers the basics of version control systems. From there, things get more interesting. One feature supported (well) by SVN and GIT is the branch. A branch is a separate version of the code in the repository, and is particularly handy if you have multiple people working on different features in the same codebase or if you need to fix bugs and maintain an older version of your code. For the purposes of this class, you probably will not need a branch - instead, a single repository is sufficient.
5 Which is better?
Ahh... let's not go there. Clearly, RCS and CVS are obsolete. That's just to say they're time tested and bulletproof. They might not offer the most features, but more features are not always good. GIT is the bleeding edge and has some very nice features for managing source code distributed among tens or hundreds of people. On the other hand, for a single person or a small team it probably offers more complexity than it does benefits. Personally, I suggest SVN as a happy medium that is also CSL supported - CSL support goes a very long way.
Copyright © 2008, 2011 Andrew R. Bernat