Missing commits when importing from Bitbucket Server via api
Summary
Customer in internal Zendesk link was using api calls to import repositories into GitLab 13.6.3 from their BitBucket Server version 5.16.
They found that on some repositories the import seemed to work correctly, in that no errors were displayed on the UI or the command line output, however the import did not contain all the commits from the original repo. In one particular case, they found that the imported repo in GitLab had commits through Sept 8th, 2017, but the original repo had commits out to January 2019. Looking at logs they found these errors:
{
"severity": "ERROR",
"time": "2020-12-22T18:14:20.684Z",
"correlation_id": "AMrbGYuohA7",
"message": "Import failed due to a BitBucket Server error",
"error": "Import failed due to a BitBucket Server error: Error 404: Repository CPC/repoone does not exist."
}
{
"correlation_id": "ZmDY0yw0T63",
"grpc.meta.auth_version": "v2",
"grpc.meta.client_name": "gitlab-sidekiq",
"grpc.meta.deadline_type": "unknown",
"grpc.method": "FindDefaultBranchName",
"grpc.request.deadline": "2020-12-22T18:16:44Z",
"grpc.request.fullMethod": "/gitaly.RefService/FindDefaultBranchName",
"grpc.request.glProjectPath": "viz/cpc/repoone",
"grpc.request.glRepository": "project-4544",
"grpc.request.repoPath": "@hashed/dc/dc/dcdc295abab240890687b51827637fb18f6d297a7bb0a109e0e58572e09cf101.git",
"grpc.request.repoStorage": "default",
"grpc.request.topLevelGroup": "@hashed",
"grpc.service": "gitaly.RefService",
"grpc.start_time": "2020-12-22T18:16:34Z",
"level": "error",
"msg": "fatal: ambiguous argument 'HEAD': unknown revision or path not in the working tree.\\nUse '--' to separate paths from revisions, like this:\\n'git \u003ccommand\u003e [\u003crevision\u003e...] -- [\u003cfile\u003e...]'\\n",
"peer.address": "@",
"pid": 12778,
"span.kind": "server",
"system": "grpc",
"time": "2020-12-22T18:16:35.085Z"
}
{
"correlation_id": "ZmDY0yw0T63",
"grpc.meta.auth_version": "v2",
"grpc.meta.client_name": "gitlab-sidekiq",
"grpc.meta.deadline_type": "unknown",
"grpc.method": "CommitLanguages",
"grpc.request.deadline": "2020-12-23T00:16:35Z",
"grpc.request.fullMethod": "/gitaly.CommitService/CommitLanguages",
"grpc.request.glProjectPath": "viz/cpc/repoone",
"grpc.request.glRepository": "project-4544",
"grpc.request.repoPath": "@hashed/dc/dc/dcdc295abab240890687b51827637fb18f6d297a7bb0a109e0e58572e09cf101.git",
"grpc.request.repoStorage": "default",
"grpc.request.topLevelGroup": "@hashed",
"grpc.service": "gitaly.CommitService",
"grpc.start_time": "2020-12-22T18:16:35Z",
"level": "error",
"msg": "PID 126641 BUNDLE_GEMFILE=/opt/gitlab/embedded/service/gitaly-ruby/Gemfile\\n",
"peer.address": "@",
"pid": 12778,
"span.kind": "server",
"system": "grpc",
"time": "2020-12-22T18:16:42.615Z"
}
They didn't have admin access to the Bitbucket Server, so we asked them to do a git clone --mirror
, then git fsck
and git count-objects -v
to try to verify the repository. Output of those looked fine:
$ git fsck
Checking object directories: 100% (256/256), done.
Checking objects: 100% (78911/78911), done.
git count-objects -v
count: 0
size: 0
in-pack: 78911
packs: 1
size-pack: 178994
prune-packable: 0
garbage: 0
size-garbage: 0
We tried importing via the GitLab UI rather than by api call, and found that the repository was fully imported with no issues. This import technique was verified on another repo that had previously been imported by api and also had missing data.
So the question seems to be what's different between running an import via an api call vs running it through the GitLab UI. This customer's api call was:
curl --request POST \
--url https://<hostname>/api/v4/import/bitbucket_server \
--header "content-type: application/json" \
--header "PRIVATE-TOKEN: <token>" \
--data '{
"bitbucket_server_url": "https://<hostname>",
"bitbucket_server_username": "<username>",
"personal_access_token": "<token>",
"bitbucket_server_project": "<project>",
"bitbucket_server_repo": "<repo>",
"target_namespace": "<namespace>"
}'