Readiness Review: Applied ML Suggested Reviewer Feature
Context
With the acquisition of UnReview , we are building an ML-driven feature to recommend the most appropriate code reviewer. The machine learning model works by extracting data from Gitlab MR API , looking into historic patterns of code-review, and recommending the reviewer to help make the developer experience more efficient and as well relevant. Currently, we have started the process of dog-fooding with internal customers , working with internal teams to recommend the most appropriate code-reviewer. We would like to start dog-fooding external customers , post the production readiness review.
To integrate UnReview into Gitlab , we aim to achieve through three phases. The three Architecture is attached Details here: https://handbook.gitlab.com/handbook/engineering/development/data-science/ai-powered/model-validation/projects/unreview/
Currently, we are achieving to finishing Milestone 3 which is the last phase of integration for the beta release of Suggested Reviewers
Architecture
-
Add architecture diagrams to this issue of feature components and how they interact with existing GitLab components. Include internal dependencies, ports, security policies, etc. -
*Describe each component of the new feature and enumerate what it does to support customer use cases. - Recommender Engine is a machine learning model used to generate code reviewer recommendations for a given project
- Recommender Service serves machine learning models for each registered project (recommendations over HTTP)
- Recommender Bot provides a public API to register new projects, request recommendations, put recommendations in an MR note, and schedule MLOps pipelines.
- Authenticator authenticates requests to Recommender Bot using GitLab API.
- MLOps pipeline is a GitLab CI pipeline for data extraction, transformation, and model training for a given project.
- Extracteur is a Golang application for extracting merge request data using the GitLab API
- Transform pipelines is a set of Google Dataflow pipelines for preparing data before model training.
-
Recommender CI is a Golang application that powers
CI template
- Cluster Management contains yaml files to install the Reviewer Recommender components on Kubernetes
-
For each component and dependency, what is the blast radius of failures? Is there anything in the feature design that will reduce this risk? - Recommender engine is simply a Python library that doesn't require any service dependencies.
- Recommender service requires Google Cloud Storage to be available.
- Authenticator fails if GitLab API is broken.
- Recommender Bot fails if Authenticator is broken.
- Recommender Bot is able to provide recommendations for all already registered projects even if GitLab CI is broken.
- Recommender Bot is not able to provide recommendations if GitLab API, Recommender Service, or Postgres is broken.
- Recommender Bot is not able to register a project if Postgres is broken.
- Extracteur and Transform pipelines are loosely coupled via Pub/Sub.
-
If applicable, explain how this new feature will scale and any potential single points of failure in the design. - Reviewer-Recommender uses Kubernetes to deploy its components. We can potentially use Istio to build a HA cluster.
- Recommender bot faces public traffic and is a potential single point of failure.
@achueshev to fill
Operational Risk Assessment --
What are the potential scalability or performance issues that may result with this change? Request time can take up to 15 secs. This primarily depends on the number of modified files in a given MR. As the number of modified files (> 50) significantly increases, the GitLab API and machine learning model become a bottleneck. -
List the external and internal dependencies to the application (ex: redis, postgres, etc) for this feature and how the it will be impacted by a failure of that dependency. Postgres, Pub/Sub, GitLab CI, GitLab API, GCS. Recommender Bot is not able to provide recommendations if GitLab API or Postgres is broken. -
Were there any features cut or compromises made to make the feature launch? - We use GitLab CI to extract data, transform it and train the models. GitLab CI doesn't support automatic retries now. If one of the jobs for the MLOps pipeline fails (e.g., API timeout) then we need to restart the broken job manually.
- We didn't introduce such metric services as Prometheus and rely on the GCP solutions.
- Recommender service deserializes the required ML model for a given project on each request. We probably need a caching layer there.
- We train and serve an ML model for each registered project. The number of models grows with the number of registered projects.
-
List the top three operational risks when this feature goes live. - Managing our own Postgres instance.
- Maintaining a scheduled MLOps pipeline for each registered project.
- System monitoring. We need to add more info to logs and forward them to logstah.
-
What are a few operational concerns that will not be present at launch, but may be a concern later? - System scalability
- ML model serving
- Increasing number of models to train, evaluate, and serve. We definitely need to find a way how to produce a limited number of generalized ML models for all registered projects.
-
Can the new product feature be safely rolled back once it is live, can it be disabled using a feature flag? - Yes, we are releasing the feature with feature flag
-
Document every way the customer will interact with this new feature and how customers will be impacted by a failure of each interaction. The reviewers-recommender
job of the CI template generates an artifact with recommendations for a target MR. This job also places generated recommendations in an MR note. If a note with the recommendations generated already exists, then thereviewers-recommender
job updates the existing MR note with the new recommendations if they change. -
As a thought experiment, think of worst-case failure scenarios for this product feature, how can the blast-radius of the failure be isolated? - The
reviewers-recommender
job of the CI template has anallow_failure: true
option, which will prevent the entire customer CI pipeline from crashing in the worst-case scenario.
- The
Database - NA for this milestone
-
If we use a database, is the data structure verified and vetted by the database team? -
Do we have an approximate growth rate of the stored data (for capacity planning)? -
Can we age data and delete data of a certain age?
Security and Compliance
-
Were the gitlab security development guidelines followed for this feature? -
If this feature requires new infrastructure, will it be updated regularly with OS updates? Reviewer-Recommender requires the Kubernetes infrastructure. -
Has effort been made to obscure or elide sensitive customer data in logging? For each registered project, the existing logs contain only the client's project ID and SHA of the commits. -
Is any potentially sensitive user-provided data persisted? If so is this data encrypted at rest? Using the GitLab API, Reviewer-Recommender extracts merge requests and stores them in the Google Cloud Storage bucket. If the target project is private, it means that we store private merge requests. The GCS bucket cannot be accessed externally and is only accessible from the Dataflow pipelines. -
Is the service subject to any regulatory/compliance standards? If so, detail which and provide details on applicable controls, management processes, additional monitoring, and mitigating factors.
-
Are we adding any new resources of the following type? -
AWS Accounts/GCP Projects -
New Subnets -
VPC/Network Peering -
DNS names -
Entry-points exposed to the internet (Public IPs, Load-Balancers, Buckets, etc...) - Expose a new gRPC Ingress
-
Other (anything relevant that might be worth mentioning) - Cloud storage - buckets
- Pubsub topics
- Increase Kubernetes workload
-
-
Secure Software Development Life Cycle (SSDLC) -
Is the configuration following a security standard? (CIS is a good baseline for example) -
Were the GitLab security development guidelines followed for this feature? -
Do we have an automatic procedure to update the infrastructure (OS, container images, packages, etc...) -
Do we use IaC (Terraform) for all the infrastructure related to this feature? If not, what kind of resources are not covered? -
**Do we have secure static code analysis tools (** kics
orcheckov
) covering this feature's terraform?
-
-
If there's a new terraform state: -
Where is to terraform state stored, and who has access to it?
-
-
Does this feature add secrets to the terraform state? If yes, can they be stored in a secrets manager? -
If we're creating new containers: -
Are we using a distroless base image? -
Do we have security scanners covering these containers? -
kics
orcheckov
for Dockerfiles for example -
GitLab's container** scanner for vulnerabilities** - Container scanning is implemented in all Docker-based applications (issue).
-
-
-
Identity and Access Management
-
Are we adding any new forms of Authentication (New service-accounts, users/password for storage, OIDC, etc...)? - Suggestion flow - GitLab -> Suggested Reviewer: requests are made with TLS encryption and time-based JWT access token, signed with a pre-shared secret that is deployed in GitLab and Suggested Reviewer instances.
- Feature set building flow - Suggested Reviewer -> GitLab: requests are made with TLS encryption and project access token which is previously shared when the project is onboarded. The access token is stored encrypted in PostgreSQL.
-
Does it follow the least privilege principle? - Suggestion flow: user must be authenticated to GitLab and have permission to view MR and project members.
- Feature set building flow: access token is scoped to the project, limited to read and reporter access level.
-
-
If we are adding any new Data Storage (Databases, buckets, etc...)
-
What kind of data is stored on each system? (secrets, customer data, audit, etc...) - PostgreSQL: project metadata, project access tokens
- K8s secrets: database credentials, GCP credentials
- Cloud storage: merge requests data
-
How is data rated according to our data classification standard (customer data is RED) -
Is data encrypted at rest? (If the storage is provided by a GCP service, the answer is most likely yes) -
Do we have audit logs on data access?
-
-
Network security (encryption and ports should be clear in the architecture diagram above)
-
Firewalls follow the least privilege principle (w/ network policies in Kubernetes or firewalls on cloud provider) -
Is the service covered by any DDoS protection solution (GCP/AWS load-balancers or Cloudflare usually cover this) -
Is the service covered by a WAF (Web Application Firewall)
-
-
Logging & Audit
-
Has an effort been made to obscure or elide sensitive customer data in logging?
-
-
Compliance
-
Is the service subject to any regulatory/compliance standards? If so, detail which and provide details on applicable controls, management processes, additional monitoring, and mitigating factors.
-
-
Performance
-
Explain what validation was done following GitLab's performance guidlines please explain or link to the results below -
Are there any potential performance impacts on the database when this feature is enabled at GitLab.com scale? -
Are there any throttling limits imposed by this feature? If so how are they managed? -
If there are throttling limits, what is the customer experience of hitting a limit? -
For all dependencies external and internal to the application, are there retry and back-off strategies for them? -
Does the feature account for brief spikes in traffic, at least 2x above the expected TPS?
Backup and Restore
-
Outside of existing backups, are there any other customer data that needs to be backed up for this product feature? -
Are backups monitored? -
Was a restore from backup tested?
Monitoring and Alerts
-
Is the service logging in JSON format and are logs forwarded to logstash? -
Is the service reporting metrics to Prometheus? -
How is the end-to-end customer experience measured? -
Do we have a target SLA in place for this service? -
Do we know what the indicators (SLI) are that map to the target SLA? -
Do we have alerts that are triggered when the SLI's (and thus the SLA) are not met? -
Do we have troubleshooting runbooks linked to these alerts? -
What are the thresholds for tweeting or issuing an official customer notification for an outage related to this feature? -
do the on-call rotations responsible for this service have access to this service?
Responsibility
-
Which individuals are the subject matter experts and know the most about this feature? Individuals: Currently we have a single engineer who is working and is the SME for this @achueshev . We also have engineers who have contributed in automating pipelines - @AndrasHerczeg . -
Which team or set of individuals will take responsibility for the reliability of the feature once it is in production? NA - As we are still not releasing it. We are dogfooding for external customers but for future it would be our own team -
Is someone from the team who built the feature on call for the launch? If not, why not? NA-
Testing
-
Describe the load test plan used for this feature. What breaking points were validated? We haven't done load testing yet. However, we can indirectly observe the load on 3 connected projects. For each request to the Reviewer-Recommender infrastructure, we get 5-6 internal subrequests: authentication via GitLab API, registering/checking the project in the Postgres DB, scheduling an MLOps pipeline via GitLab API (only if the target project is new), finding a reference to the project model in the Postgres DB, getting MR changes via GitLab API, requesting recommendations via Recommender Service. With the next milestone, we will reduce the number of GitLab API calls. -
For the component failures that were theorized for this feature, were they tested? If so include the results of these failure tests. We run integration tests to make sure we are handling errors correctly. A few examples of a bot project that provides public API: - merge request notes - https://gitlab.com/gitlab-org/modelops/applied-ml/review-recommender/recommender-bot-service/-/blob/main/internal/mergerequest/repository/gitlab/repository_test.go#L503
- merge request changes - https://gitlab.com/gitlab-org/modelops/applied-ml/review-recommender/recommender-bot-service/-/blob/main/internal/mergerequest/repository/gitlab/repository_test.go#L415
- recommendations - https://gitlab.com/gitlab-org/modelops/applied-ml/review-recommender/recommender-bot-service/-/blob/main/internal/recommendations/repository/rr/repository_test.go#L117
- grpc errors - https://gitlab.com/gitlab-org/modelops/applied-ml/review-recommender/recommender-bot-service/-/blob/main/internal/recommendations/delivery/grpc/server_test.go#L215
-
Give a brief overview of what tests are run automatically in GitLab's CI/CD pipeline for this feature? We run unit and integration tests using GitLab CI. To run the integration tests, we have reproduced the testing Kubernetes environment and also installed the Kubernetes GitLab runner. After tagging a CI job with the k8s
tag, GitLab will run the job in our testing environment, which has all the necessary components, including Kubernetes Postgres.