Skip to content

Introduce Auto Rollback facility

Shinya Maeda requested to merge introduce-auto-rollback-service into master

What does this MR do?

This MR introduces a facility to handles the core business logic of Auto Rollback. The AutoRollbackService finds an appropriate rollback target and re-deploy it. It will comes with a couple of safe mechanism, such as rate limiter to prevent multiple auto rollbacks in a short interval.

The new Sidekiq worker AutoRollbackWorker will be used in an upcoming MR. Basically, it'll be executed when a new AlertManagement::Alert is created with highest severity.

This also fixes the long standing bug that with_deployable sometimes returns inexistent deployable object. This causes a huge noise on Sentry, so better to be properly handled.

Related #35404 (closed) Close #218659 (closed)

Query Performance

Since the query on find_rollback_target doesn't perform well on an environment with many deployments, we need to add an database index to optimize the query.

Here is an EXPLAIN ANALYZE output with joe-bot. This query was performed on one of the most busiest deployments project on gitlab.com - gitlab-com/www-gitlab-com.

SELECT "deployments".* FROM "deployments"
INNER JOIN ci_builds ON ci_builds.id = deployments.deployable_id
WHERE "deployments"."environment_id" = 137
  AND "deployments"."status" = 2
  AND "deployments"."sha" = '292fa023062154f4ccb8f35c39f234dd60f1a071'
ORDER BY "deployments"."id" DESC LIMIT 1
Time: 1.264 ms
  - planning: 0.791 ms
  - execution: 0.473 ms
    - I/O read: 0.376 ms
    - I/O write: 0.000 ms

Shared buffers:
  - hits: 0 from the buffer pool
  - reads: 4 (~32.00 KiB) from the OS file cache, including disk I/O
  - dirtied: 0
  - writes: 0
 Limit  (cost=7.19..7.20 rows=1 width=140) (actual time=0.436..0.438 rows=0 loops=1)
   Buffers: shared read=4
   I/O Timings: read=0.376
   ->  Sort  (cost=7.19..7.20 rows=1 width=140) (actual time=0.435..0.436 rows=0 loops=1)
         Sort Key: deployments.id DESC
         Sort Method: quicksort  Memory: 25kB
         Buffers: shared read=4
         I/O Timings: read=0.376
         ->  Nested Loop  (cost=1.14..7.18 rows=1 width=140) (actual time=0.430..0.431 rows=0 loops=1)
               Buffers: shared read=4
               I/O Timings: read=0.376
               ->  Index Scan using dos_test on public.deployments  (cost=0.57..3.59 rows=1 width=140) (actual time=0.429..0.429 rows=0 loops=1)
                     Index Cond: ((deployments.environment_id = 137) AND (deployments.status = 2) AND ((deployments.sha)::text = '292fa023062154f4ccb8f35c39f234dd60f1a071'::text))
                     Buffers: shared read=4
                     I/O Timings: read=0.376
               ->  Index Only Scan using ci_builds_pkey on public.ci_builds  (cost=0.57..3.59 rows=1 width=4) (actual time=0.000..0.000 rows=0 loops=0)
                     Index Cond: (ci_builds.id = deployments.deployable_id)
                     Heap Fetches: 0

(NOTE: dos_test is same with the new index)

Feature Flag

This feature is under development and disabled by default with cd_auto_rollback feature flag.

Database Migration

shinya@shinya-MS-7A34:~/workspace/thin-gdk/services/rails/src$ tre bin/rails db:migrate:down VERSION=20201112145311
INFO: This script is a predefined script in devkitkat.
== 20201112145311 AddIndexOnShaForInitialDeployments: reverting ===============
-- transaction_open?()
   -> 0.0000s
-- indexes(:services)
   -> 0.0030s
-- current_schema()
   -> 0.0001s
== 20201112145311 AddIndexOnShaForInitialDeployments: reverted (0.0044s) ======
shinya@shinya-MS-7A34:~/workspace/thin-gdk/services/rails/src$ tre bin/rails db:migrate:up VERSION=20201112145311
INFO: This script is a predefined script in devkitkat.
== 20201112145311 AddIndexOnShaForInitialDeployments: migrating ===============
-- transaction_open?()
   -> 0.0000s
-- index_exists?(:deployments, [:environment_id, :status, :sha], {:name=>"index_deployments_on_environment_status_sha", :algorithm=>:concurrently})
   -> 0.0052s
== 20201112145311 AddIndexOnShaForInitialDeployments: migrated (0.0056s) ======

Does this MR meet the acceptance criteria?

Conformity

Availability and Testing

Security

If this MR contains changes to processing or storing of credentials or tokens, authorization and authentication methods and other items described in the security review guidelines:

  • Label as security and @ mention @gitlab-com/gl-security/appsec
  • The MR includes necessary changes to maintain consistency between UI, API, email, or other methods
  • Security reports checked/validated by a reviewer from the AppSec team
Edited by Shinya Maeda

Merge request reports