Pull mirroring "update now" button doesn't clear hard failure

This bug happens when the last mirror was attempted more that 5 minutes ago.

For example we start with an import state like so:

=> #<ProjectImportState:0x00007fda18d00e18
 id: 19,
 project_id: 37,
 retry_count: 15,
 last_update_started_at: Mon, 08 Jun 2020 15:56:38 UTC +00:00,
 last_update_scheduled_at: Mon, 08 Jun 2020 15:56:38 UTC +00:00,
 next_execution_timestamp: Mon, 08 Jun 2020 21:56:44 UTC +00:00,
 status: "failed",
 jid: nil,
 last_error: nil,
 last_update_at: Mon, 08 Jun 2020 15:56:40 UTC +00:00,
 last_successful_update_at: Mon, 08 Jun 2020 13:02:34 UTC +00:00,
 correlation_id_value: "befcce294d97146ba68ed4037d78a8ba">

When we do the force update, we call StartPullMirroringService.new(project, current_user, pause_on_hard_failure: false).execute which does:

Resets the retry count to 0, but does not save this yet

if import_state.hard_failed?
   return error('Mirroring for the project is on pause', 403) if params[:pause_on_hard_failure]

   import_state.reset_retry_count
end

Checks update_now?(import_state) and calls import_state.force_import_job!

Now here lies the problem. Calling force_import_job! on a mutated ProjectImportState with a retry_count of 0 doesn't work because it does an early return because mirror_update_due? is true. This is because hard_failed? is now false on this mutated object.

We worked around this by calling project.import_state.force_import_job! in the console. This worked and scheduled the next mirror because in the console context, the retry_count is still 15.

I believe this bug was introduced by 79052fc8. cc @patrickbajao

cc @arihantar

Note: When the last attempt was less than 5 minutes ago, clearing the hard failure works fine because it just updates the next_execution_timestamp directly.

Customer ticket (internal): https://gitlab.zendesk.com/agent/tickets/160056

Edited Jun 12, 2020 by Heinrich Lee Yu