The Readiness Review document is designed to help you prepare your features and services for the GitLab Production Platforms.
Please engage with the relevant teams as soon as possible to begin review even if there are incomplete items below.
All sections should be completed up to the current [maturity level](
For example, if the target maturity is "Beta", then items under "Experiment" and "Beta" should be completed.
_While it is encouraged for parts of this document to be filled out, not all of the items below will be relevant. Leave all non-applicable items intact and add 'N/A' or reasons for why in place of the response._
_This Guide is just that, a Guide. If something is not asked, but should be, it is strongly encouraged to add it as necessary._
## Experiment
### Service Catalog
_The items below will be reviewed by Scalability:Practices team._
-[ ] Link to the [service catalog entry]( for the service. Ensure that the following items are present in the service catalog, or listed here:
- Link to or provide a high-level summary of this new product feature.
- Link to the [Architecture Design Workflow]( for this feature, if there wasn't a design completed for this feature please explain why.
- List the feature group that created this feature/service and who are the current Engineering Managers, Product Managers and their Directors.
- List individuals are the subject matter experts and know the most about this feature.
- List the team or set of individuals will take responsibility for the reliability of the feature once it is in production.
- List the member(s) of the team who built the feature will be on-call for the launch.
- List the external and internal dependencies to the application (ex: redis, postgres, etc) for this feature and how the service will be impacted by a failure of that dependency.
### Infrastructure
_The items below will be reviewed by the Scalability:Practices team._
- [ ] Do we use IaC (e.g., Terraform) for all the infrastructure related to this feature? If not, what kind of resources are not covered?
- [ ] Is the service covered by any DDoS protection solution (GCP/AWS load-balancers or Cloudflare usually cover this)?
-[ ] Are all cloud infrastructure resources labeled according to the [Infrastructure Labels and Tags]( guidelines?
### Operational Risk
_The items below will be reviewed by the Scalability:Practices team._
- [ ] List the top three operational risks when this feature goes live.
- [ ] For each component and dependency, what is the blast radius of failures? Is there anything in the feature design that will reduce this risk?
### Monitoring and Alerting
_The items below will be reviewed by the Scalability:Practices team._
-[ ] Link to the [metrics catalog]( for the service
### Deployment
_The items below will be reviewed by the Delivery team._
-[ ] Will a [change management issue]( be used for rollout? If so, link to it here.
- [ ] Can the new product feature be safely rolled back once it is live, can it be disabled using a feature flag?
-[ ] How are the artifacts being built for this feature (e.g., using the [CNG]( or another image building pipeline).
### Security Considerations
_The items below will be reviewed by the Infrasec team._
- [ ] Link or list information for new resources of the following type:
- AWS Accounts/GCP Projects:
- New Subnets:
- VPC/Network Peering:
- DNS names:
- Entry-points exposed to the internet (Public IPs, Load-Balancers, Buckets, etc...):
- Other (anything relevant that might be worth mention):
-[ ] Were the [GitLab security development guidelines]( followed for this feature?
-[ ] Was an [Application Security Review]( requested, if appropriate? Link it here.
-[ ] Do we have an automatic procedure to update the infrastructure (OS, container images, packages, etc...). For example, using unattended upgrade or [renovate bot]( to keep dependencies up-to-date?
-[ ] For IaC (e.g., Terraform), is there any secure static code analysis tools like ([kics]( or [checkov]( If not and new IaC is being introduced, please explain why.
-[ ] If we're creating new containers (e.g., a Dockerfile with an image build pipeline), are we using `kics` or `checkov` to scan Dockerfiles or [GitLab's container]( scanner for vulnerabilities?
### Identity and Access Management
_The items below will be reviewed by the Infrasec team._
- [ ] Are we adding any new forms of Authentication (New service-accounts, users/password for storage, OIDC, etc...)?
-[ ] Was effort put in to ensure that the new service follows the [least privilege principle](, so that permissions are reduced as much as possible?
- [ ] Do firewalls follow the least privilege principle (w/ network policies in Kubernetes or firewalls on cloud provider)?
-[ ] Is the service covered by a [WAF (Web Application Firewall)]( in [Cloudflare](
### Logging, Audit and Data Access
_The items below will be reviewed by the Infrasec team._
- [ ] Did we make an effort to redact customer data from logs?
- [ ] What kind of data is stored on each system (secrets, customer data, audit, etc...)?
-[ ] How is data rated according to our [data classification standard]( data is RED)?
-[ ] Do we have audit logs for when data is accessed? If you are unsure or if using the central logging and a new pubsub topic was created, create an issue in the [Security Logging Project]( using the `add-remove-change-log-source` template.
- [ ] Ensure appropriate logs are being kept for compliance and requirements for retention are met.
-[ ] If the data classification = Red for the new environment, please create a [Security Compliance Intake issue]([title]=System%20Intake:%20%5BSystem%20Name%20FY2%23%20Q%23%5D&issuable_template=intakeform). Note this is not necessary if the service is deployed in existing Production infrastructure.
## Beta
### Monitoring and Alerting
_The items below will be reviewed by the Scalability:Practices team._
- [ ] Link to examples of logs on
-[ ] Link to the [Grafana dashboard]( for this service.
### Backup, Restore, DR and Retention
_The items below will be reviewed by the Scalability:Practices team._
- [ ] Are there custom backup/restore requirements?
- [ ] Are backups monitored?
- [ ] Was a restore from backup tested?
- [ ] Link to information about growth rate of stored data.
### Deployment
_The items below will be reviewed by the Delivery team._
-[ ] Will a [change management issue]( be used for rollout? If so, link to it here.
- [ ] Does this feature have any version compatibility requirements with other components (e.g., Gitaly, Sidekiq, Rails) that will require a specific order of deployments?
-[ ] Is this feature validated by our [QA blackbox tests](
- [ ] Will it be possible to roll back this feature? If so explain how it will be possible.
### Security
_The items below will be reviewed by the InfraSec team._
- [ ] Put yourself in an attacker's shoes and list some examples of "What could possibly go wrong?". Are you OK going into Beta knowing that?
- [ ] Link to any outstanding security-related epics & issues for this feature. Are you OK going into Beta with those still on the TODO list?
## General Availability
### Monitoring and Alerting
_The items below will be reviewed by the Scalability:Practices team._
- [ ] Link to the troubleshooting runbooks.
- [ ] Link to an example of an alert and a corresponding runbook.
- [ ] Confirm that on-call SREs have access to this service and will be on-call. If this is not the case, please add an explanation here.
### Operational Risk
_The items below will be reviewed by the Scalability:Practices team._
- [ ] Link to notes or testing results for assessing the outcome of failures of individual components.
- [ ] What are the potential scalability or performance issues that may result with this change?
- [ ] What are a few operational concerns that will not be present at launch, but may be a concern later?
- [ ] Are there any single points of failure in the design? If so list them here.
- [ ] As a thought experiment, think of worst-case failure scenarios for this product feature, how can the blast-radius of the failure be isolated?
### Backup, Restore, DR and Retention
_The items below will be reviewed by the Scalability:Practices team._
- [ ] Are there any special requirements for Disaster Recovery for both Regional and Zone failures beyond our current Disaster Recovery processes that are in place?
- [ ] How does data age? Can data over a certain age be deleted?
### Performance, Scalability and Capacity Planning
_The items below will be reviewed by the Scalability:Practices team._
-[ ] Link to any performance validation that was done according to [performance guidelines](
- [ ] Link to any load testing plans and results.
- [ ] Are there any potential performance impacts on the Postgres database or Redis when this feature is enabled at scale?
-[ ] Explain how this feature uses our [rate limiting]( features.
- [ ] Are there retry and back-off strategies for external dependencies?
- [ ] Does the feature account for brief spikes in traffic, at least 2x above the expected rate?
### Deployment
_The items below will be reviewed by the Delivery team._
-[ ] Will a [change management issue]( be used for rollout? If so, link to it here.
- [ ] Are there healthchecks or SLIs that can be relied on for deployment/rollbacks?
-[ ] Does building artifacts or deployment depend at all on [](