repocutter issues and improvements
While trying to clean up/rearrange an SVN repo for migration to git, I stumbled over the following issues with repocutter:
I have addressed all these issues here: https://gitlab.com/vqrs/reposurgeon/-/commits/master/cutter
Maybe this can be of use to you.
Encountered issues:
-
Property deletions are not supported (version 3 dump file format), repocutter aborts if encountered
-
using the
-rfor repo transformation subcommands will remove all other revisions instead of only transform the selected revs and pass the others through unchanged (anything that goes through Report()).The documentation also doesn't seem to imply that using
-rwith other subcommands will simply drop all other revs. -
ReadUntilNext breaks select/deselect subcommands if any file/commit message/property happens to contain a line starting with "Revision-number: "
-
renumber can't cope with the length of a revision number changing, producing unparsable dump files: if a revision changes from "1001" to "999", the
*-lengthheaders will all be incorrect -
if renumber encounters a revision number that hasn't been renumbered (because it was empty after filtering/deslected), it replaces it with 0.
This is problematic when extracting/pruning parts of the repo: I may have changed a file in rev 2 in
keep/myFile. rev 3 contains only changes totoRemove/*. rev 4 copieskeep/myFiletokeep/myFileCopy. Subversion has recorded the copy-from-revision 3 unfortunately, so when IexpungethetoRemovedirectory, revision 3 is dropped, a subsequent renumber will then record a copy-from-revision 0. In this case, it's correct to fall back to2. -
renumber is extremely slow, probably because it operates on a line-by-line basis. (think hours instead of a minute for ~10k commits, 3 gigs total)
-
encountered a trailing newline in a mergeinfo entry, this breaks the renumber parser
-
pathrename creates un-importable dumps when "moving" into a new subdirectory.
If a file is added into
my/new/subdirectory, subversion expected to see anaddaction formy, then one formy/newand then one formy/new/subdirectoryin this sequence at some point earlier.Since renaming
originaltomy/new/subdirectorywill simply rewrite all occurrences oforiginal, the transformed dump will only contain anaddaction formy/new/subdirectory, making svnload error out because it doesn't recursively create directories that don't exist.
Implementation notes:
I tried adhering to your convention as much as I could tell.
I've never written Go before, please keep that in mind.
-
see 5)
-
I also changed
Reportto always print the SVN dump header, that seems more useful with the new-rbehavior, also added diagnostics when using the wrong range separator-vs:by accident -
The implementation switches back and forth between line based reading and skipping over content. It's a bit of an ugly hack but it seems to work.
-
I added a new
fixmergeinfocommand for this. It usesReportand a property-hook to trim the offending lines and remove unnecessary onces,Reportwill then automatically write proper*-lengthheaders. -
no notes
-
Originally, I wanted to rewrite
renumberusingReportby introducing arevhookparameter but couldn't get it to reliably change the headers using Push and re-reading it in the ReadRevisionHeaders, so I wrote a bespoke parser instead that reliably skips the binary content without parsing it. -
no notes
-
for this I added the command
adddir mywhich can synthesisze a newaddaction into a specific commit via-r. It iterates over all node actions. Once it finds a node whoseNode-pathcontains the prefixNEWDIRfollowed by a slash, it will create a precedingaddforNEWDIR