[FF] integration_propagation_simplified_batching - Rollout
<!-- Title suggestion: [FF] `integration_propagation_simplified_batching` -- Rollout --> ## Summary This issue is to roll out [the feature](https://gitlab.com/gitlab-org/gitlab/-/work_items/591182) on production, that is currently behind the `integration_propagation_simplified_batching` feature flag. The feature simplifies the `each_batch` scope used when propagating integration settings to projects and groups under a large group hierarchy. By removing expensive `NOT EXISTS` subqueries and namespace subqueries from the batch boundary queries, we avoid statement timeouts that prevented integrations from being propagated to projects in large group trees (~1,364+ subgroups, ~6,200+ projects). Query plan comparison (start query, `LIMIT 1`, namespace 9970): - **Old**: 7,606ms execution, Nested Loop Anti Join with 6,440 loops - **New**: 454ms execution, simple Index Scan with namespace_id filter - **Speedup**: ~17x ## Owners - Most appropriate Slack channel to reach out to: `#group-import` - Best individual to reach out to: @carlad-gl ## Expectations ### What are we expecting to happen? When the feature flag is enabled for a group, propagating integration settings (e.g., Jira, Datadog) to projects and subgroups under that group should complete without statement timeouts, even for large hierarchies. The trade-off is that the batch scope is broader (includes projects that already have the integration or are archived/pending_delete), so some worker jobs may process empty batches. This is harmless — the worker re-applies all filters and skips projects that don't need the integration. ### What can go wrong and how would we detect it? - **Increased Sidekiq job volume**: More worker jobs may be enqueued since the batch scope no longer pre-filters. Monitor `PropagateIntegrationProjectWorker` and `PropagateIntegrationGroupWorker` job counts and durations. - **Memory usage from pluck**: The namespace ID pluck loads all descendant namespace IDs into memory. For very large hierarchies this could use more memory than usual. Monitor Sidekiq worker memory usage. - **Unexpected integration propagation failures**: Monitor error rates on integration propagation in Kibana logs. Relevant dashboards: - [Sidekiq worker dashboard](https://dashboards.gitlab.net/d/sidekiq-main/sidekiq-overview) - [Database saturation](https://dashboards.gitlab.net/d/alerts-sat_patroni_apdex/alerts-patroni-apdex-saturation) ## Rollout Steps Note: Please make sure to run the chatops commands in the Slack channel that gets impacted by the command. ### Rollout on non-production environments - Verify the MR with the feature flag is merged to `master` and has been deployed to non-production environments with `/chatops run auto_deploy status <merge-commit-of-your-feature>` - [x] Deploy the feature flag at a percentage (recommended percentage: 50%) with `/chatops run feature set integration_propagation_simplified_batching 50 --actors --dev --pre --staging --staging-ref` - [x] Monitor that the error rates did not increase (repeat with a different percentage as necessary). - [x] Enable the feature globally on non-production environments with `/chatops run feature set integration_propagation_simplified_batching true --dev --pre --staging --staging-ref` - [x] Verify that the feature works as expected. The best environment to validate the feature in is [`staging-canary`](https://about.gitlab.com/handbook/engineering/infrastructure/environments/#staging-canary) as this is the first environment deployed to. Make sure you are [configured to use canary](https://next.gitlab.com/). - [x] If the feature flag causes end-to-end tests to fail, disable the feature flag on staging to avoid blocking [deployments](https://about.gitlab.com/handbook/engineering/deployments-and-releases/deployments/). ### Before production rollout - [x] If the change is significant and you wanted to announce in [#whats-happening-at-gitlab](https://gitlab.enterprise.slack.com/archives/C0259241C), it best to do it before rollout to `gitlab-org/gitlab-com`. ### Specific rollout on production For visibility, all `/chatops` commands that target production must be executed in the [`#production` Slack channel](https://gitlab.slack.com/archives/C101F3796) and cross-posted (with the command results) to the responsible team's Slack channel. - Ensure that the feature MRs have been deployed to both production and canary with `/chatops run auto_deploy status <merge-commit-of-your-feature>` - [x] Depending on the [type of actor](https://docs.gitlab.com/development/feature_flags/#feature-actors) you are using, pick one of these options: - For **group-actor**: `/chatops run feature set --group=gitlab-org,gitlab-com integration_propagation_simplified_batching true` - [x] Verify that the feature works for the specific actors. ### Preparation before global rollout - [x] Set a milestone to this rollout issue to signal for enabling and removing the feature flag when it is stable. - [x] Check if the feature flag change needs to be accompanied with a [change management issue](https://about.gitlab.com/handbook/engineering/infrastructure-platforms/change-management/#feature-flags-and-the-change-management-process). Cross link the issue here if it does. - [x] Ensure that you or a representative in development can be available for at least 2 hours after feature flag updates in production. If a different developer will be covering, or an exception is needed, please inform the oncall SRE by using the `@sre-oncall` Slack alias. - [x] Ensure that documentation exists for the feature, and the [version history text](https://docs.gitlab.com/development/documentation/feature_flags/#add-history-text) has been updated. - [x] Notify the [`#support_gitlab-com` Slack channel](https://gitlab.slack.com/archives/C4XFU81LG) and your team channel ([more guidance when this is necessary in the dev docs](https://docs.gitlab.com/development/feature_flags/controls/#communicate-the-change)). ### Global rollout on production For visibility, all `/chatops` commands that target production must be executed in the [`#production` Slack channel](https://gitlab.slack.com/archives/C101F3796) and cross-posted (with the command results) to the responsible team's Slack channel. - [x] [Incrementally roll out](https://docs.gitlab.com/development/feature_flags/controls/#process) the feature on production. - Example: `/chatops run feature set integration_propagation_simplified_batching <rollout-percentage> --actors`. - Between every step wait for at least 15 minutes and monitor the appropriate graphs on https://dashboards.gitlab.net. - [ ] After the feature has been 100% enabled, wait for [at least one day before releasing the feature](#release-the-feature). ### Release the feature After the feature has been [deemed stable](https://about.gitlab.com/handbook/product-development-flow/feature-flag-lifecycle/#including-a-feature-behind-feature-flag-in-the-final-release), the [clean up](https://docs.gitlab.com/development/feature_flags/controls/#cleaning-up) should be done as soon as possible to permanently enable the feature and reduce complexity in the codebase. - [ ] Create a merge request to remove the `integration_propagation_simplified_batching` feature flag. The MR should include the following changes: - Remove all references to the feature flag from the codebase. - Remove the YAML definitions for the feature from the repository. - [ ] Ensure that the cleanup MR has been included in the release package. - [ ] Close [the feature issue](https://gitlab.com/gitlab-org/gitlab/-/work_items/591182) to indicate the feature will be released in the current milestone. - [ ] Once the cleanup MR has been deployed to production, clean up the feature flag from all environments by running these chatops command in `#production` channel: `/chatops run feature delete integration_propagation_simplified_batching --dev --pre --staging --staging-ref --production` - [ ] Close this rollout issue. ## Rollback Steps - [ ] This feature can be disabled on production by running the following Chatops command: ``` /chatops run feature set integration_propagation_simplified_batching false ``` - [ ] Disable the feature flag on non-production environments: ``` /chatops run feature set integration_propagation_simplified_batching false --dev --pre --staging --staging-ref ``` - [ ] Delete feature flag from all environments: ``` /chatops run feature delete integration_propagation_simplified_batching --dev --pre --staging --staging-ref --production ```
issue