Pull mirroring "update now" button doesn't clear hard failure
This bug happens when the last mirror was attempted more that 5 minutes ago.
For example we start with an import state like so:
=> #<ProjectImportState:0x00007fda18d00e18
id: 19,
project_id: 37,
retry_count: 15,
last_update_started_at: Mon, 08 Jun 2020 15:56:38 UTC +00:00,
last_update_scheduled_at: Mon, 08 Jun 2020 15:56:38 UTC +00:00,
next_execution_timestamp: Mon, 08 Jun 2020 21:56:44 UTC +00:00,
status: "failed",
jid: nil,
last_error: nil,
last_update_at: Mon, 08 Jun 2020 15:56:40 UTC +00:00,
last_successful_update_at: Mon, 08 Jun 2020 13:02:34 UTC +00:00,
correlation_id_value: "befcce294d97146ba68ed4037d78a8ba">
When we do the force update, we call StartPullMirroringService.new(project, current_user, pause_on_hard_failure: false).execute
which does:
-
Resets the retry count to
0
, but does not save this yetif import_state.hard_failed? return error('Mirroring for the project is on pause', 403) if params[:pause_on_hard_failure] import_state.reset_retry_count end
-
Checks
update_now?(import_state)
and callsimport_state.force_import_job!
Now here lies the problem. Calling force_import_job!
on a mutated ProjectImportState
with a retry_count
of 0
doesn't work because it does an early return because mirror_update_due?
is true
. This is because hard_failed?
is now false
on this mutated object.
We worked around this by calling project.import_state.force_import_job!
in the console. This worked and scheduled the next mirror because in the console context, the retry_count
is still 15
.
I believe this bug was introduced by 79052fc8. cc @patrickbajao
cc @arihantar
Note: When the last attempt was less than 5 minutes ago, clearing the hard failure works fine because it just updates the next_execution_timestamp
directly.
Customer ticket (internal): https://gitlab.zendesk.com/agent/tickets/160056