repocutter issues and improvements

While trying to clean up/rearrange an SVN repo for migration to git, I stumbled over the following issues with repocutter:

I have addressed all these issues here: https://gitlab.com/vqrs/reposurgeon/-/commits/master/cutter

Maybe this can be of use to you.

Encountered issues:

  1. Property deletions are not supported (version 3 dump file format), repocutter aborts if encountered

  2. using the -r for repo transformation subcommands will remove all other revisions instead of only transform the selected revs and pass the others through unchanged (anything that goes through Report()).

    The documentation also doesn't seem to imply that using -r with other subcommands will simply drop all other revs.

  3. ReadUntilNext breaks select/deselect subcommands if any file/commit message/property happens to contain a line starting with "Revision-number: "

  4. renumber can't cope with the length of a revision number changing, producing unparsable dump files: if a revision changes from "1001" to "999", the *-length headers will all be incorrect

  5. if renumber encounters a revision number that hasn't been renumbered (because it was empty after filtering/deslected), it replaces it with 0.

    This is problematic when extracting/pruning parts of the repo: I may have changed a file in rev 2 in keep/myFile. rev 3 contains only changes to toRemove/*. rev 4 copies keep/myFile to keep/myFileCopy. Subversion has recorded the copy-from-revision 3 unfortunately, so when I expunge the toRemove directory, revision 3 is dropped, a subsequent renumber will then record a copy-from-revision 0. In this case, it's correct to fall back to 2.

  6. renumber is extremely slow, probably because it operates on a line-by-line basis. (think hours instead of a minute for ~10k commits, 3 gigs total)

  7. encountered a trailing newline in a mergeinfo entry, this breaks the renumber parser

  8. pathrename creates un-importable dumps when "moving" into a new subdirectory.

    If a file is added into my/new/subdirectory, subversion expected to see an add action for my, then one for my/new and then one for my/new/subdirectory in this sequence at some point earlier.

    Since renaming original to my/new/subdirectory will simply rewrite all occurrences of original, the transformed dump will only contain an add action for my/new/subdirectory, making svnload error out because it doesn't recursively create directories that don't exist.


Implementation notes:

I tried adhering to your convention as much as I could tell.

I've never written Go before, please keep that in mind.

  1. see 5)

  2. I also changed Report to always print the SVN dump header, that seems more useful with the new -r behavior, also added diagnostics when using the wrong range separator - vs : by accident

  3. The implementation switches back and forth between line based reading and skipping over content. It's a bit of an ugly hack but it seems to work.


  1. I added a new fixmergeinfo command for this. It uses Report and a property-hook to trim the offending lines and remove unnecessary onces, Report will then automatically write proper *-length headers.

  2. no notes

  3. Originally, I wanted to rewrite renumber using Report by introducing a revhook parameter but couldn't get it to reliably change the headers using Push and re-reading it in the ReadRevisionHeaders, so I wrote a bespoke parser instead that reliably skips the binary content without parsing it.

  4. no notes

  5. for this I added the command adddir my which can synthesisze a new add action into a specific commit via -r. It iterates over all node actions. Once it finds a node whose Node-path contains the prefix NEWDIR followed by a slash, it will create a preceding add for NEWDIR

Edited by Christian Vonrüti