Retry sending Usage Ping in case of network errors (!35083) · Merge requests · GitLab.org / GitLab

Alishan Ladhani requested to merge 216155-retry-sending-the-usage-ping into master Jun 22, 2020

What does this MR do?

In #216155 (closed), it was noted that Usage Ping can be made more reliable by retrying in case of a network error.

There are two types errors that we expect:

The Version app (version.gitlab.com) is unavailable
The usage ping request returns an unsuccessful status code

GitlabUsagePingWorker will retry 3 times over ~24 hours when it encounters an error.

Approaches considered

Approach	Retry mechanism	Considerations
One worker - compute and send data	Sidekiq	Computing usage data is somewhat expensive, since it generates hundreds (or even thousands) of DB queries. Network errors are assumed to be relatively infrequent, so this should not be a problem.
One worker - compute and send data	Cron	Currently, the worker is scheduled to run once a week, on a random day/hour/minute. This ensures that requests to the Version app are distributed evenly and the load is predictable. Scheduling the worker to run more than once a week could make the load less evenly distributed, and more difficult to predict. For example, if the worker is scheduled to run once a day, the schedule of the worker (daily) is no longer tied to the schedule of usage ping (weekly). We could keep timestamps in Redis to indicate when we last computed/sent data, and check those to ensure a weekly cadence. But we lose control over which day of the week usage pings are sent. For example, if people are more likely to set up self-managed instances on Monday, we will see a continuously increasing number of usage ping requests every Monday. The consideration for a single worker retrying via Sidekiq also applies here.
Two workers, one computes data, other sends data	Sidekiq	In this scenario, `GitlabUsagePingWorker` computes data, and schedules `GitLabUsagePingRequestWorker` to send data. Each worker can have its own retry policy. The two workers would be quite coupled.
Two workers, one computes data, other sends data	Cron	Similar to one worker retrying via cron. We need to ensure an even distribution of requests to the Version app. A possible approach could be scheduling `GitLabUsagePingRequestWorker` every hour at a random minute.

Does this MR meet the acceptance criteria?

Conformity

Availability and Testing

Review and add/update tests for this feature/bug. Consider all test levels. See the Test Planning Process.
[-] Tested in all supported browsers
[-] Informed Infrastructure department of a default or new setting change, if applicable per definition of done

Security

If this MR contains changes to processing or storing of credentials or tokens, authorization and authentication methods and other items described in the security review guidelines:

[-] Label as security and @ mention @gitlab-com/gl-security/appsec
[-] The MR includes necessary changes to maintain consistency between UI, API, email, or other methods
[-] Security reports checked/validated by a reviewer from the AppSec team

Edited Jul 29, 2020 by Alishan Ladhani

Retry sending Usage Ping in case of network errors