Cells Topology Service Production Readiness Review (Experiment)
## Production Readiness This issue serves as a tracking issue to guide you through the readiness review. It's not the production readiness document itself! The readiness documentation will be added to the [project](https://gitlab.com/gitlab-com/gl-infra/readiness/) with a merge request, where stakeholders from different teams can collaborate. --- ## Readiness MR https://gitlab.com/gitlab-com/gl-infra/readiness/-/merge_requests/221 ## Reviewers The reviewers will be filled in as one of the steps of the checklist below. If a reviewer in the "Mandatory" section is not allocated, please add the reason why next to the name. ### Mandatory - Scalability:Practices: @schin1 @gsgl - Delivery: @jennykim-gitlab - InfraSec: https://gitlab.com/gitlab-com/gl-security/product-security/infrastructure-security/bau/-/issues/9351 ### Optional _Delete these reviewers if they do not apply_ - Development: {+ reviewer name +} _you may want to consider a review from the team members who were closely involved in the development of this work to ensure that the details match their mental model_ - Application Security: {+ reviewer name +} _if there are concerns about application security, the group's Application Security stable counterpart can help_ ## Readiness Checklist The following items should be completed by the person initiating the readiness review: - [x] Review the [Production Readiness Review](https://about.gitlab.com/handbook/engineering/infrastructure/production/readiness/) handbook page. - [x] Create this issue and assign it to yourself. - Set a due-date for when you believe the readiness will be completed (this can be updated later if necessary). - Add an appropriate label to the issue from the list below. Review the labels periodically to ensure the appropriate label is assigned to keep the review progressing. - ~"workflow-infra::Triage" : The author has an idea of the feature or change but is pending a decision to proceed with it. - ~"workflow-infra::Proposal" : A decision to proceed with an idea or change has been made and Readiness MR is being prepared - ~"workflow-infra::Ready" : The Readiness MR is ready and awaiting review. ssue assigned to the DRI.Author - ~"workflow-infra::In Progress" : Review discussions are ongoing between the DRI and SRE Reviewer. Issue is assigned to the DRI and SRE Reviewer - ~"workflow-infra::Done" : The Readiness review is complete, Readiness MR is accepted and merged - ~"workflow-infra::Cancelled" : Readiness review is no longer required due to other external reasons. After applying this label, issue will be closed. - ~"workflow-infra::Stalled" : Review is paused due to a change in priority. - ~"workflow-infra::Blocked" : Review is blocked due to external dependencies or other factors. Where possible, a [blocking issue](https://docs.gitlab.com/ee/user/project/issues/related_issues.html) should also be set. - [x] In the "Reviewers" section above, add the reviewer names. **Names will be assigned by reaching out to the engineering manager of the corresponding team, do this by `@` mentioning the team members leading the following groups:**. - **Scalability:Practices**: Reach out to [Scalability: Practices](https://handbook.gitlab.com/handbook/engineering/infrastructure/team/scalability/#scalabilitypractices) - **Delivery**: Reach out to [Delivery management](https://gitlab.com/groups/gitlab-org/delivery/managers/-/group_members?with_inherited_permissions=exclude) - **InfraSec**: [Create an issue in this team's tracker](https://gitlab.com/gitlab-com/gl-security/security-operations/infrastructure-security/bau/-/issues/new?issue[title]=Security%20Review%20Request%3A%20{%2B%20Service%2FFeature%20Name%20%2B}&issuable_template=production_readiness). More information is available on the Infrastructure Security Team's [handbook page](https://about.gitlab.com/handbook/security/security-engineering-and-research/infrastructure-security/#working-with-us). After the issue is created, put a link to the issue next to Infrasec reviewer item below and add the reviewer name after one has been assigned. - [x] Create the first draft of the readiness review by copying the template below and submitting an MR. Do not remove any items or section in the template. It is only required to fill in the items **up to and including** the corresponding maturity level and lower. For example, for ~Readiness::Beta all sections under Beta and Experiment will need to be completed. - [x] Assign the initial set reviewers to the MR. Once the MR has been assigned, add the label ~"workflow-infra::In Progress" to this issue. - [x] Add a link to the MR in the "Readiness MR" section at the top of this issue - [x] Once the MR has been sent out for review, add a `~"Readiness::*` scoped label for the corresponding target [maturity level](https://docs.gitlab.com/ee/policy/experiment-beta-support.html) for the review. - [x] When last review of the MR is complete, and it is merged do one of the following: 1. If the feature will remain at the current maturity level for an uncertain amount of time, close the issue and add a `~"workflow-infra::done"` label to the issue. 2. If the feature will need to reviewed for the next maturity level soon, add the corresponding `~"Readiness::*` scoped label and repeat the process using the same issue. - [x] (Optional) If it is later decided to not proceed with this proposal, add ~"workflow-infra::Cancelled" and close this issue ## Readiness MR Template Expand the section below to view the readiness template, this will be the starting point for the readiness merge request. **Create `<name>/index.md` as a new merge request with the following content where <name> is something short and descriptive for the change being proposed** <details> The Readiness Review document is designed to help you prepare your features and services for the GitLab Production Platforms. Please engage with the relevant teams as soon as possible to begin review even if there are incomplete items below. All sections should be completed up to the current [maturity level](https://docs.gitlab.com/ee/policy/experiment-beta-support.html). For example, if the target maturity is "Beta", then items under "Experiment" and "Beta" should be completed. _While it is encouraged for parts of this document to be filled out, not all of the items below will be relevant. Leave all non-applicable items intact and add 'N/A' or reasons for why in place of the response._ _This Guide is just that, a Guide. If something is not asked, but should be, it is strongly encouraged to add it as necessary._ ## Experiment ### Service Catalog _The items below will be reviewed by Scalability:Practices team._ - [ ] Link to the [service catalog entry](https://gitlab.com/gitlab-com/runbooks/-/tree/master/services) for the service. Ensure that the all of the fields are populated. ### Infrastructure _The items below will be reviewed by the Scalability:Practices team._ - [ ] Do we use IaC (e.g., Terraform) for all the infrastructure related to this feature? If not, what kind of resources are not covered? - [ ] Is the service covered by any DDoS protection solution (GCP/AWS load-balancers or Cloudflare usually cover this)? - [ ] Are all cloud infrastructure resources labeled according to the [Infrastructure Labels and Tags](https://about.gitlab.com/handbook/infrastructure-standards/labels-tags/) guidelines? ### Operational Risk _The items below will be reviewed by the Scalability:Practices team._ - [ ] List the top three operational risks when this feature goes live. - [ ] For each component and dependency, what is the blast radius of failures? Is there anything in the feature design that will reduce this risk? ### Monitoring and Alerting _The items below will be reviewed by the Scalability:Practices team._ - [ ] Link to the [metrics catalog](https://gitlab.com/gitlab-com/runbooks/-/tree/master/metrics-catalog/services) for the service ### Deployment _The items below will be reviewed by the Delivery team._ - [ ] Will a [change management issue](https://about.gitlab.com/handbook/engineering/infrastructure/change-management/) be used for rollout? If so, link to it here. - [ ] Can the new product feature be safely rolled back once it is live, can it be disabled using a feature flag? - [ ] How are the artifacts being built for this feature (e.g., using the [CNG](https://gitlab.com/gitlab-org/build/CNG/) or another image building pipeline). ### Security Considerations _The items below will be reviewed by the Infrasec team._ - [ ] Link or list information for new resources of the following type: - AWS Accounts/GCP Projects: - New Subnets: - VPC/Network Peering: - DNS names: - Entry-points exposed to the internet (Public IPs, Load-Balancers, Buckets, etc...): - Other (anything relevant that might be worth mention): - [ ] Were the [GitLab security development guidelines](https://docs.gitlab.com/ee/development/secure_coding_guidelines.html) followed for this feature? - [ ] Was an [Application Security Review](https://handbook.gitlab.com/handbook/security/security-engineering/application-security/appsec-reviews/) requested, if appropriate? Link it here. - [ ] Do we have an automatic procedure to update the infrastructure (OS, container images, packages, etc...). For example, using unattended upgrade or [renovate bot](https://github.com/renovatebot/renovate) to keep dependencies up-to-date? - [ ] For IaC (e.g., Terraform), is there any secure static code analysis tools like ([kics](https://github.com/Checkmarx/kics) or [checkov](https://github.com/bridgecrewio/checkov))? If not and new IaC is being introduced, please explain why. - [ ] If we're creating new containers (e.g., a Dockerfile with an image build pipeline), are we using `kics` or `checkov` to scan Dockerfiles or [GitLab's container](https://docs.gitlab.com/ee/user/application_security/container_scanning/#configuration) scanner for vulnerabilities? ### Identity and Access Management _The items below will be reviewed by the Infrasec team._ - [ ] Are we adding any new forms of Authentication (New service-accounts, users/password for storage, OIDC, etc...)? - [ ] Was effort put in to ensure that the new service follows the [least privilege principle](https://en.wikipedia.org/wiki/Principle_of_least_privilege), so that permissions are reduced as much as possible? - [ ] Do firewalls follow the least privilege principle (w/ network policies in Kubernetes or firewalls on cloud provider)? - [ ] Is the service covered by a [WAF (Web Application Firewall)](https://cheatsheetseries.owasp.org/cheatsheets/Secure_Cloud_Architecture_Cheat_Sheet.html#web-application-firewall) in [Cloudflare](https://gitlab.com/gitlab-com/runbooks/-/tree/master/docs/cloudflare#how-we-use-page-rules-and-waf-rules-to-counter-abuse-and-attacks)? ### Logging, Audit and Data Access _The items below will be reviewed by the Infrasec team._ - [ ] Did we make an effort to redact customer data from logs? - [ ] What kind of data is stored on each system (secrets, customer data, audit, etc...)? - [ ] How is data rated according to our [data classification standard](https://about.gitlab.com/handbook/engineering/security/data-classification-standard.html) (customer data is RED)? - [ ] Do we have audit logs for when data is accessed? If you are unsure or if using the central logging and a new pubsub topic was created, create an issue in the [Security Logging Project](https://gitlab.com/gitlab-com/gl-security/engineering-and-research/security-logging/security-logging/-/issues/new?issuable_template=add-remove-change-log-source) using the `add-remove-change-log-source` template. - [ ] Ensure appropriate logs are being kept for compliance and requirements for retention are met. - [ ] If the data classification = Red for the new environment, please create a [Security Compliance Intake issue](https://gitlab.com/gitlab-com/gl-security/security-assurance/security-compliance-commercial-and-dedicated/security-compliance-intake/-/issues/new?issue[title]=System%20Intake:%20%5BSystem%20Name%20FY2%23%20Q%23%5D&issuable_template=intakeform). Note this is not necessary if the service is deployed in existing Production infrastructure. ``` </details>
issue