Commit 74b5a948 authored by Eric S. Raymond's avatar Eric S. Raymond

Documentation improvement stimulated by Patrick Maupin.

parent 0a5a8918
......@@ -52,7 +52,8 @@ Create a scratch directory for your conversion work.
Copy http://catb.org/~esr/reposurgeon/conversion.mk[this
generic makefile] designed to sequence conversions to be the
Makefile in your conversion directory. Then set the variables near the
top appropriately for your project.
top appropriately for your project. You will need the reposurgeon
tools installed to use it.
This Makefile will help you avoid typing a lot of fiddly commands
by hand, and ensure that later products of the conversion pipeline
......@@ -92,13 +93,14 @@ foonly=Fred Foonly <foonly@foobar.com>
You can optionally specify an third field that is a timezone
description, either an ISO8601 offset (like "-0500") or a named
entry in the Unix timezon file (like "America/Chicago"). If you do,
entry in the Unix timezone file (like "America/Chicago"). If you do,
this timezone will be attached to the timestamps on commits made by
this person.
Using the generic Makefile for Subversion, "make stubmap" will
display a start on an author-map file. Edit in real
names and addresses to the right of the equals signs.
names and addresses (and optionally offsets) to the right of
the equals signs.
How best to get this information will vary depending on your
situation.
......@@ -119,13 +121,13 @@ incomplete map with a request for completions often gives good results.
* If you can download the archives of the project's development
mailing list, grepping out all the From addresses may suggest some
obvious marches with otherwise unknown usernames.
obvious matches with otherwise unknown usernames.
If you are converting the repository for an open-source project, it is
good courtesy and good practice after the above first step to email
the contributors and invite them to supply a preferred form of their
name and a preferred email address to be used in the mapping. The
reason for this is that some sites, like
name, a preferred email address to be used in the mapping, and a
timezone offset. The reason for this is that some sites, like
https://www.ohloh.net[OpenHub], aggregate participation statistics
(and thus, reputation) across many projects, using developer name and
email address as a primary key.
......@@ -138,12 +140,10 @@ identifications in parallel with the rest of the work.
Your first step will be converting your repository to git.
=== CVS and Subversion ===
There are at least half a dozen utilities out there for lifting CVS
and Subversion repositories to a git repository or import stream. My
opinion of them can be gauged by the fact that I wrote my own. (You
can read a
opinion of them can be gauged by the fact that I wrote my own:
reposurgeon. You can read a
http://www.catb.org/~esr/reposurgeon/features.html[description] of the
things it does that other conversion tools don't.)
......@@ -156,6 +156,8 @@ The generic-workflow Makefile will call reposurgeon
for you, interpreting your $(PROJECT).lift file, when you type "make".
You may have to watch the baton spin for a few minutes.
=== CVS ===
If you are exporting from CVS, it may be a good idea to run some
trial conversions with cvsconvert, a wrapper script shipped with
cvs-fast-export. This script runs a conversion direct to git;
......@@ -163,6 +165,17 @@ the advantage is that it can do a comparison of the repository
histories and identify problems for you to fix in your lift
script.
Problems in CVS conversions generally arise from the fact that CVS's
data model doesn't have real multi-file changesets, which are the
fundamental unit of a commit in DVCSes. It can be difficult to fully
recover changesets from what are actually large numbers of single-file
changes flying in loose formation - in fact, old CVS operator errors
can sometimes make it impossible. Bad tools silently propagate such
damage forward into your translation. Good tools, like cvs-fast-export
and reposurgeon, warn you of problems and help you recover.
=== Subversion ===
Normally reposurgeon will do branch analysis for you.
On most Subversion repositories, and in particular anything with a
standard trunk/tags/branches layout, it will do the right thing. (It
......@@ -177,7 +190,7 @@ generality. It can even translate Subversion commits that alter
multiple branches.
Special Google Code note: If you are converting a Subversion
project from Google Code, you ,ay want to use the command "debranch
project from Google Code, you may want to use the command "debranch
wiki" to turn the wiki branch into a subdirectory on your master
branch.
......@@ -187,7 +200,46 @@ falls off somewhat on very large repositories (apparently due to I/O
costs). You can speed it up significantly by building a binary with
http://cython.org/[cython]; there's a production to do this
in the reposurgeon Makefile.
Unlike CVS, Subversion repositories have real changesets and the work
in them can effectively always be mapped unto equivalent DVCS commits.
The parent-child relationships among commits will also translate
cleanly. There is, however, a minor problem around tags, and a
significant problem around merges.
The tag problem arises because Subversion tags are really branches
that you've conventionally agreed not to commit to after the initial
branch copy (that's what the tags/ directory name conveys). But
Subversion doesn't enforce any prohibition against committing to
the tag branch, and various odd things can happen if you do. The
reposurgeon analyzer warns about these cases, and reposurgeon gives
you tools for coping with them.
In a DVCS, a merge normally coalesces two entire branches. Subversion
has something close to this in newer versions; it's called a "sync
merge" working on directories (and is expressed as an svn:mergeinfo
property of the target directory that names the source). A sync merge
of a branch directory into another branch directory behaves like a
DVCS merge; reposurgeon picks these up and translates them for you.
The older, more basic Subversion merge is per file and is expressed by
per-file svn:mergeinfo properties. These correspond to what in
DVCS-land are called "cherry-picks", which just replay a commit from a
source branch onto a target branch but do *not* create cross-branch
links.
Sometimes Subversion developers use collections of per-file mergeinfo
properties to express partial branch merges. This does not map to
the DVCS model at all well, and trying to promote these to full-branch
merges by hand is actually dangerous. An excellent essay,
https://plus.google.com/100357083629018071519/posts/jG7CN9R1SsZ[Partial
git merges -- just say no.] explores the problem in depth.
The bottom line is that reposurgeon warns about per-file svn:mergeinfo
properties _and then discards them_ for good reasons. If you feel an
urge to hand-edit in a branch merge based on these, do so with care
and check your work.
=== Other VCSes ===
SCCS: Use http://www.catb.org/esr/sccs2rcs/[sccs2rcs]
......@@ -221,7 +273,7 @@ reposurgeon manual.
For Subversion lifts, use the "compare" and "compare-tags"
productions to compare the head revision and tagged revisions between
the unconverted repostory. If you didn't use the cvsconvert wrapper
for your CVS lift, these productions hve a similar effect. The only
for your CVS lift, these productions have a similar effect. The only
differences you should see are those due to keyword expansion and
ignore-file lifting. If this is not true, you have found a serious
bug in either reposurgeon or the front end it used. Consult
......@@ -232,7 +284,7 @@ If you are converting from CVS, use reposurgeon's graph command to
examine the conversion, looking (in particular) for misplaced tags or
branch joins. Often these can be manually repaired with little
effort. These flaws do 'not' necessarily imply bugs in cvs-fast-export
or reposugeon; they may simply indicate previously undetected
or reposurgeon; they may simply indicate previously undetected
malformations in the history. However, reporting them may help improve
cvs-fast-export.
......@@ -270,7 +322,7 @@ Branch tip deletes, deletealls, and unexpressed merges::
In Subversion it is common practice to delete a branch directory
when that line of development is finished or merged to trunk; this
makes sense because it reduces the checkout size of the repo in later
revisions. In a DVCS deletes at a branch tip don't save you any
revisions. In a DVCS, deletes at a branch tip don't save you any
storage, so it makes more sense to leave the branch with all of its
tip content live if you're not going to delete it entirely. Sometimes
editing a later commit to have the branch tip as a parent (creating
......@@ -672,7 +724,8 @@ Makefile.
Improved advice about force-pushing. Simplified conversion procedure.
No longer recommending comparison of Subversion with a git-svn translation;
it's too flaky and limited for that to be a good idea. Add recommendation
to create a synthetic conversion tag.
to create a synthetic conversion tag. Describe differences
between SVN and DVCS merging models in detail.
// Local Variables:
// compile-command: "make dvcs-migration-guide.html"
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment