More badly handled ChangeLog entries
In https://gcc.gnu.org/ml/gcc/2019-12/msg00420.html Jakub Jelinek found some cases of existing GCC ChangeLog entries that reposurgeon handles badly. Here's a synthetic test illustrating some of these. In general, treating such entries as invalid attributions (so falling back to the committer identity) is probably better than generating bad author attributions in git. Test with: reposurgeon "read <test.svn" "sourcetype svn" "prefer git" "changelogs" "rebuild test-git"
In the first commit, "Commit with two authors named in ChangeLog header.", the domain name of the first named author is treated as the start of the name of the second author. The regular expression used to extract name and email address requires no @ in the name, but not being anchored that just cuts things off part way through the first author's email address. Anchoring the regular expression, at least at the start, might be a plausible fix; in general, a line with multiple email addresses is best treated as not providing a valid attribution. (I don't expect authors to be extracted from this commit.)
The second commit illustrates how it can be valid in GNU ChangeLog format to have text after the email address, but I think it's OK that this one falls back to the committer as it does at present.
The third and fourth commits show cases with a time (and possibly timezone) after a yyyy-mm-dd date. I don't know if those were ever specified in the GNU Coding Standards or generated by Emacs, but I expect it should be straightforward to make the regular expressions accept time and timezone after the date rather than treating them as part of the author.