Acceptance Testing RemoteService.FetchInternalRemote

~Conversation: #343 (closed)

See the Migration Process documentation for more information on the Acceptance Testing stage of the process.

Details

  • Feature Toggle Name: gitaly_remote_fetch_internal_remote
  • GRPC Service: RemoteService::FetchInternalRemote
  • Required Gitaly Version: v0.66.0
  • Required GitLab Version: v10.4

1. Preparation

  • Routes: what routes use this migration?
    1. POST /api/v4/projects/:project_id

2. Development Trial

Skipping development trial, since this is an EE only feature.

Check Dev Server Versions

  • Gitaly: Gitaly Dev Version Tracker Dashboard
  • GitLab: https://dev.gitlab.org/help

Enable on dev.gitlab.org:

  • !feature-set gitaly_remote_fetch_internal_remote true in #dev-gitlab

Then leave running while monitoring and performing some testing through web, api or SSH.

Monitor (initially )

  • Monitor Grafana feature dashboard on dev: Gitaly Feature Status Dashboard
  • Inspect logs in ELK:
    • FetchInternalRemote invocations, last hour for unusual activity
    • FetchInternalRemote errors, last hour for unusual activity
  • Check for errors in Gitaly Dev Sentry
  • Check for errors in GitLab Dev Sentry

Continue?

  • On unexpectedly high calls rates, error rates, CPU activity, etc, disable trial immediately with !feature-set gitaly_remote_fetch_internal_remote false in #dev-gitlab otherwise leave running and proceed proceed to next stage.

3. Staging Trial

Check Staging Server Versions

  • Gitaly: Gitaly Staging Version Tracker Dashboard
  • GitLab: https://staging.gitlab.com/help

Enable on staging.gitlab.com

  • !feature-set gitaly_remote_fetch_internal_remote true in #development

Then leave running while monitoring for at least 15 minutes while performing some testing through web, api or SSH.

Monitor (at least every 5 minutes, preferably real-time)

  • Monitor Grafana feature dashboard on staging: Gitaly Feature Status Dashboard
  • Inspect logs in ELK:
    • FetchInternalRemote invocations, last hour for unusual activity
    • FetchInternalRemote errors, last hour for unusual activity
  • Check for errors in Gitaly Staging Sentry
  • Check for errors in GitLab Staging Sentry

Continue?

  • On unexpectedly high calls rates, error rates, CPU activity, etc, disable trial immediately using !feature-set gitaly_remote_fetch_internal_remote false in #development otherwise leave running and proceed to next stage.

4. Production Server Version Check

  • Gitaly: Gitaly Production Version Tracker Dashboard
  • GitLab: https://gitlab.com/help

5. Initial Impact Check

  • Create an issue in the infrastructure tracker: Create issue now
  • Set Gitaly to 1% using the command !feature-set gitaly_remote_fetch_internal_remote 1 in #production

Then leave running while monitoring for at least 15 minutes while performing some testing through web, api or SSH.

Monitor (at least every 5 minutes, preferably real-time)

  • Monitor Grafana feature dashboard on production: Gitaly Feature Status Dashboard
  • Inspect logs in ELK:
    • FetchInternalRemote invocations, last hour for unusual activity
    • FetchInternalRemote errors, last hour for unusual activity
  • Check for errors in Gitaly Sentry
  • Check for errors in GitLab Sentry

Continue?

  • On unexpectedly high calls rates, error rates, CPU activity, etc, disable trial immediately with !feature-set gitaly_remote_fetch_internal_remote false in #production otherwise leave running and proceed to next stage.

6. Low Impact Trial

  • Set Gitaly to 5% using the command !feature-set gitaly_remote_fetch_internal_remote 5 in #production

Then leave running while monitoring for at least 2 hours.

Monitor (at least every 20 minutes)

  • Monitor Grafana feature dashboard on production: Gitaly Feature Status Dashboard
  • Inspect logs in ELK:
    • FetchInternalRemote invocations, last 2 hours for unusual activity
    • FetchInternalRemote errors, last 2 hours for unusual activity
  • Check for errors in Gitaly Sentry
  • Check for errors in GitLab Sentry

Continue?

  • On unexpectedly high calls rates, error rates, CPU activity, etc, disable trial immediately with !feature-set gitaly_remote_fetch_internal_remote false in #production otherwise leave running and proceed to next stage.

7. Mid Impact Trial

  • Set Gitaly to 50% using the command !feature-set gitaly_remote_fetch_internal_remote 50 in #production

Then leave running while monitoring for at least 24 hours.

Monitor (at least every few hours)

  • Monitor Grafana feature dashboard on production: Gitaly Feature Status Dashboard
  • Inspect logs in ELK:
    • FetchInternalRemote invocations, last 24 hours for unusual activity
    • FetchInternalRemote errors, last 24 hours for unusual activity
  • Check for errors in Gitaly Sentry
  • Check for errors in GitLab Sentry

Continue?

  • On unexpectedly high calls rates, error rates, CPU activity, etc, disable trial immediately with !feature-set gitaly_remote_fetch_internal_remote false in #production otherwise leave running and proceed to next stage.

8. Full Impact Trial

  • Set Gitaly to 100% using the command !feature-set gitaly_remote_fetch_internal_remote 100 in #production

Then leave running while monitoring for at least 1 week.

Monitor (at least every day)

  • Monitor Grafana feature dashboard on production: Gitaly Feature Status Dashboard
  • Inspect logs in ELK:
    • FetchInternalRemote invocations, last 7 days for unusual activity
    • FetchInternalRemote errors, last 7 days for unusual activity
  • Check for errors in Gitaly Sentry
  • Check for errors in GitLab Sentry

Success?

  • Close this issue and mark the ~Conversation as ~"Migration:Opt-In"
Edited Feb 28, 2018 by Alejandro Rodríguez
Assignee Loading
Time tracking Loading