External merge request diffs state inconsistent with actual object storage state
ZD: https://gitlab.zendesk.com/agent/tickets/127486
First report:
>>>
Updated storage to use external diffs, but something went horribly wrong
it seems. A smattering of merge requests (some old closed ones, some new
open ones) suddenly "broke". The UI just shows a spinner when looking at
the 'changes' tab in a merge request. The HTTP request to the diffs.json
gives back a 500 error with the following stack in logs:
```
RuntimeError (new position is outside of file):
lib/gitlab/http_io.rb:60:in `seek'
app/models/merge_request_diff_file.rb:19:in `block in diff'
app/models/merge_request_diff.rb:376:in `opening_external_diff'
lib/gitlab/metrics/instrumentation.rb:161:in `block in
opening_external_diff'
lib/gitlab/metrics/method_call.rb:36:in `measure'
lib/gitlab/metrics/instrumentation.rb:161:in `opening_external_diff'
app/models/merge_request_diff_file.rb:18:in `diff'
lib/gitlab/metrics/instrumentation.rb:161:in `block in diff'
lib/gitlab/metrics/method_call.rb:36:in `measure'
lib/gitlab/metrics/instrumentation.rb:161:in `diff'
app/models/concerns/diff_file.rb:9:in `to_hash'
app/models/merge_request_diff.rb:514:in `map'
app/models/merge_request_diff.rb:514:in `block in load_diffs'
app/models/merge_request_diff.rb:381:in `block in opening_external_diff'
app/uploaders/gitlab_uploader.rb:96:in `open'
app/models/merge_request_diff.rb:378:in `opening_external_diff'
lib/gitlab/metrics/instrumentation.rb:161:in `block in
opening_external_diff'
lib/gitlab/metrics/method_call.rb:36:in `measure'
lib/gitlab/metrics/instrumentation.rb:161:in `opening_external_diff'
app/models/merge_request_diff.rb:507:in `load_diffs'
lib/gitlab/metrics/instrumentation.rb:161:in `block in load_diffs'
lib/gitlab/metrics/method_call.rb:36:in `measure'
lib/gitlab/metrics/instrumentation.rb:161:in `load_diffs'
app/models/merge_request_diff.rb:204:in `raw_diffs'
lib/gitlab/metrics/instrumentation.rb:161:in `block in raw_diffs'
lib/gitlab/metrics/method_call.rb:36:in `measure'
lib/gitlab/metrics/instrumentation.rb:161:in `raw_diffs'
lib/gitlab/diff/file_collection/base.rb:30:in `diffs'
lib/gitlab/diff/file_collection/base.rb:34:in `diff_files'
lib/gitlab/diff/file_collection/merge_request_diff.rb:20:in `diff_files'
lib/gitlab/diff/file_collection/base.rb:41:in `unfold_diff_files'
app/controllers/projects/merge_requests/diffs_controller.rb:26:in
`render_diffs'
app/controllers/projects/merge_requests/diffs_controller.rb:13:in `show'
ee/lib/gitlab/ip_address_state.rb:10:in `with'
ee/app/controllers/ee/application_controller.rb:28:in
`set_current_ip_address'
lib/gitlab/session.rb:11:in `with_session'
app/controllers/application_controller.rb:445:in `set_session_storage'
lib/gitlab/i18n.rb:55:in `with_locale'
lib/gitlab/i18n.rb:61:in `with_user_locale'
app/controllers/application_controller.rb:439:in `set_locale'
lib/gitlab/middleware/rails_queue_duration.rb:27:in `call'
lib/gitlab/metrics/rack_middleware.rb:17:in `block in call'
lib/gitlab/metrics/transaction.rb:57:in `run'
lib/gitlab/metrics/rack_middleware.rb:17:in `call'
lib/gitlab/middleware/multipart.rb:103:in `call'
lib/gitlab/request_profiler/middleware.rb:16:in `call'
ee/lib/gitlab/jira/middleware.rb:17:in `call'
lib/gitlab/middleware/go.rb:20:in `call'
lib/gitlab/etag_caching/middleware.rb:13:in `call'
lib/gitlab/middleware/correlation_id.rb:16:in `block in call'
lib/gitlab/middleware/correlation_id.rb:15:in `call'
lib/gitlab/middleware/read_only/controller.rb:42:in `call'
lib/gitlab/middleware/read_only.rb:18:in `call'
lib/gitlab/middleware/basic_health_check.rb:25:in `call'
lib/gitlab/request_context.rb:26:in `call'
lib/gitlab/metrics/requests_rack_middleware.rb:29:in `call'
lib/gitlab/middleware/release_env.rb:12:in `call'
```
>>>
Follow-up:
First, get the project path in question (e.g. gitlab-org/gitlab-ce). Then find the merge request ID that's having trouble. In the console, run sudo gitlab-rails console and send us the output:
```
project = Project.find_by_full_path('group/your-project-path')
mr = project.merge_requests.find_by(iid: INSERT-YOUR-MR-ID-HERE)
diff = mr.merge_request_diff
diff.external_diff
diff.external_diff.url # <--- Download the URL given here
diff.merge_request_diff_files.select(:old_path, :new_path, :external_diff_offset, :external_diff_size).to_a
```
Please also send the contents of the data in diff.external_diff.url if possible.
Response:
>>>
Ok - this is interesting. The diff didn't exist where the
diff.external_diff.url portion said it should have. (Don't much care
about hiding the bucket name now.) Instead, it was on disk. Which makes
me wonder if maybe they just didn't get copied up to s3 due to the
configuration not being quite right?
So, what I did was run aws s3 sync .
s3://prod-ss-gitlab-mr-diffs-storage/merge_request_diffs (from the
external-diffs/merge_request_diffs directory). Voila, everything shows
up. For the sake of making sure that I'm not missing anything....
>>>
Question: @nick.thomas How might this happen? Do we currently assume that if external diffs are enabled, they made it to the bucket?
issue