Cloud native build logs performance tests
Production Change
Change Component | Description |
---|---|
Change Objective | Performance tests of a redesigned feature (cloud native build logs) |
Change Type | Feature Redesign |
Services Impacted | Redis, GitLab CI/CD |
Change Technician | @grzesiek, SRE on call |
Change Criticality | C2 |
Change Type | changeunscheduled |
Change Reviewer | SRE on-call? |
Due Date | 2020-08-10 |
Time tracking | 2 hours |
Downtime Component | No downtime |
Detailed steps for the change
Pre-Change Steps - steps to be completed before execution of the change
Estimated Time to Complete - 1 hour
-
Create grzesiek/live-traces-tests
with test pipeline, the same as tested in staging -
Disable shared minutes limit for @grzesiek
on gitlab.com -
Monitor Redis Overview metrics before the change to ensure that it is healthy -
Run tests pipeline with the feature disabled, take screenshots of Redis metrics
Change Steps - steps to take to execute the change
Estimated Time to Complete - 1 hour
-
/chatops run feature set --project=grzesiek/live-traces-tests ci_enable_live_trace true
-
Run a new pipeline in grzesiek/live-traces-tests
-
Monitor Redis Overview metrics -
Wait until pipeline completes -
/chatops run feature set --project=grzesiek/live-traces-tests ci_enable_live_trace false
Post-Change Steps - steps to take to verify the change
Estimated Time to Complete - 1 hour
-
Take screenshots of metrics, post results to gitlab-org/gitlab#217988 (closed)
Rollback
Rollback steps - steps to be taken in the event of a need to rollback this change
Estimated Time to Complete 10 minutes
-
Cancel the pipeline -
Disable the feature flag
Monitoring
Key metrics to observe
- Metric: Sentry error for build traces
- Location: sentry errors
- What changes to this metric should prompt a rollback: a lot of Redis / Exclusive Lock / Traces related exceptions
- Metric: API endpoint for build logs
- Location: API for job traces dashboard
- What changes to this metric should prompt a rollback: noticeable spikes in errors or latency
- Metric: Redis overview
- Location: Redis overview dashboard
- What changes to this metric should prompt a rollback: memory consumption too high, CPU saturation
Summary of infrastructure changes
-
Does this change introduce new compute instances? No -
Does this change re-size any existing compute instances? No -
Does this change introduce any additional usage of tooling like Elastic Search, CDNs, Cloudflare, etc?
This change might involve additional Redis usage.
Changes checklist
-
This issue has a criticality label (e.g. C1, C2, C3, C4) and a change-type label (e.g. changeunscheduled, changescheduled). -
This issue has the change technician as the assignee. -
Pre-Change, Change, Post-Change, and Rollback steps and have been filled out and reviewed. -
Necessary approvals have been completed based on the Change Management Workflow. -
Change has been tested in staging and resultes noted in a comment on this issue. -
A dry-run has been conducted and results noted in a comment on this issue. -
SRE on-call has been informed prior to change being rolled out. (In #production channel, mention @sre-oncall
and this issue.) -
There are currently no active incidents.
Edited by Grzegorz Bizon