Skip to content

Implement package pipelines endpoint

🌱 Context

Currently, we serve package pipelines in the package endpoint. This can be a performance issue because the pipelines are not paginated, and a package can have thousands of pipelines. To improve the performance of the package endpoint, we'll remove the pipelines from the payload, and introduce a new endpoint to serve the pipelines for a single package.

The performance issue affects both the group package endpoint and the project package endpoint, but we will initially implement the package pipelines endpoint for the project context only.

What does this MR do and why?

Introduce a new endpoint to serve the pipelines for a single project package. The new endpoint has the same permissions as the GET project package endpoint.

Screenshots or screen recordings

Example curl request and API response:

rad@mjolnirv3 ~/p/gitlab-development-kit (main)> curl --header "PRIVATE-TOKEN: redacted" "http://gdk.test:3000/api/v4/pr
ojects/7/packages/28/pipelines"
[{"id":203,"iid":7,"project_id":7,"sha":"b83d6e391c22777fca1ed3012fce84f633d7fed0","ref":"master","status":"pending","source":"push","created_at":"2023-05-02T07:13:31.103Z","updated_at":"2023-05-02T07:13:31.103Z","web_url":"http://gdk.test:3000/flightjs/Flight/-/pipelines/203","user":null},{"id":202,"iid":6,"project_id":7,"sha":"b83d6e391c22777fca1ed3012fce84f633d7fed0","ref":"master","status":"pending","source":"push","created_at":"2023-05-02T07:13:31.059Z","updated_at":"2023-05-02T07:13:31.059Z","web_url":"http://gdk.test:3000/flightjs/Flight/-/pipelines/202","user":null},{"id":201,"iid":5,"project_id":7,"sha":"b83d6e391c22777fca1ed3012fce84f633d7fed0","ref":"master","status":"pending","source":"push","created_at":"2023-05-02T07:13:31.006Z","updated_at":"2023-05-02T07:13:31.006Z","web_url":"http://gdk.test:3000/flightjs/Flight/-/pipelines/201","user":null}]    

How to set up and validate locally

The stock GDK database contains some packages, and some pipelines, but no pipelines are assigned to a package. A quick way to test this MR is to assign some pipelines to an existing package.

  1. Get the ID and project_id of an existing package from the Rails console
p = ::Packages::Package.last
p.id # <- This is the package_id
p.project_id # <- This is the project_id
  1. Get existing pipeline IDs for the project from the Rails console:
::Ci::Pipeline.where(project_id: <project_id>).map(&:id)
  1. From GDK psql, assign some pipelines to the package:
insert into packages_build_infos(package_id, pipeline_id) values(<package_id>, 201);

insert into packages_build_infos(package_id, pipeline_id) values(<package_id>, 202);

insert into packages_build_infos(package_id, pipeline_id) values(<package_id>, 203);

Add a few more records to be able to test pagination.

  1. Test the endpoint:
curl --header "PRIVATE-TOKEN: $YOUR_ACCESS_TOKEN" "http://gdk.test:3001/api/v4/pr
ojects/<project_id>/packages/<package_id>/pipelines"

5.1. Test pagination:

curl --header "PRIVATE-TOKEN: $YOUR_ACCESS_TOKEN" "http://gdk.test:3001/api/v4/pr
ojects/<project_id>/packages/<package_id>/pipelines?per_page=2"

This should return only 2 records, even if you have created more pipelines in step 3.

5.2. Get the URL for the next page of records:

curl -I --header "PRIVATE-TOKEN: $YOUR_ACCESS_TOKEN" "http://gdk.test:3001/api/v4/pr
ojects/<project_id>/packages/<package_id>/pipelines?per_page=2"

Example result:

HTTP/1.1 200 OK
Cache-Control: max-age=0, private, must-revalidate
Content-Security-Policy: default-src 'none'
Content-Type: application/json
Etag: W/"f6384b4cc0943609803d1036371a4ff3"
Link: <http://gdk.test:3000/api/v4/projects/19/packages/26/pipelines?cursor=eyJpZCI6IjMwIiwiX2tkIjoibiJ9&id=19&package_id=26&page=1&per_page=2>; rel="next"
Vary: Origin
X-Content-Type-Options: nosniff
X-Frame-Options: SAMEORIGIN
X-Request-Id: 01H1P7X3JVF3N91Y5JCW2RCPAC
X-Runtime: 0.148372
Date: Tue, 30 May 2023 11:50:13 GMT
Content-Length: 583

5.3. Use the value of the Link header to get the next page of records: curl --request GET "http://gdk.test:3000/api/v4/projects/19/packages/26/pipelines?cursor=eyJpZCI6IjMwIiwiX2tkIjoibiJ9&id=19&package_id=26&page=1&per_page=2"

Add -I to the request to get the URL for the next page of records.

MR acceptance checklist

This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.

💾 Database Analysis

pg.ai setup
reset;
EXEC CREATE INDEX index_packages_build_infos_package_id_id ON packages_build_infos USING btree (package_id, id);

This is the Ruby code block that introduces new queries:

paginator = package.build_infos.keyset_paginate(per_page: per_page, cursor: cursor)                              
build_infos = paginator.records                                                                                  
pipelines = ::Ci::Pipeline.id_in(build_infos.map(&:pipeline_id)).order_id_desc    

We have code that ensures per_page is between 1 to 20.

The block of code above results in 2 queries.

Query 1

Ruby:

paginator = package.build_infos.keyset_paginate(per_page: per_page, cursor: cursor) 
build_infos = paginator.records

SQL for first page (no cursor parameter):

SELECT "packages_build_infos".* 
FROM "packages_build_infos" 
WHERE "packages_build_infos"."package_id" = 4110798 
ORDER BY "packages_build_infos"."id" DESC 
LIMIT 101
Time: 3.512 ms  
  - planning: 1.024 ms  
  - execution: 2.488 ms  
    - I/O read: 2.023 ms  
    - I/O write: 0.000 ms  
  
Shared buffers:  
  - hits: 2 (~16.00 KiB) from the buffer pool  
  - reads: 103 (~824.00 KiB) from the OS file cache, including disk I/O  
  - dirtied: 8 (~64.00 KiB)  
  - writes: 0  

https://console.postgres.ai/gitlab/gitlab-production-tunnel-pg12/sessions/18956/commands/62869

SQL for succeeding page (with cursor parameter):

SELECT "packages_build_infos".* 
FROM "packages_build_infos" 
WHERE "packages_build_infos"."package_id" = 4110798 
AND ("packages_build_infos"."id" < 22152765)
ORDER BY "packages_build_infos"."id" DESC 
LIMIT 101 
Time: 4.604 ms
  - planning: 1.205 ms
  - execution: 3.399 ms
    - I/O read: 2.353 ms
    - I/O write: 0.000 ms

Shared buffers:
  - hits: 9 (~72.00 KiB) from the buffer pool
  - reads: 99 (~792.00 KiB) from the OS file cache, including disk I/O
  - dirtied: 23 (~184.00 KiB)
  - writes: 0

https://console.postgres.ai/gitlab/gitlab-production-tunnel-pg12/sessions/18956/commands/62872

Query 2: Ruby:
pipelines = ::Ci::Pipeline.id_in(build_infos.map(&:pipeline_id)).order_id_desc 

SQL is the same for the first page and succeeding pages. Only the IDs inside the WHERE "ci_pipelines"."id" IN (...) clause will change:

EXPLAIN SELECT 
  "ci_pipelines"."id",
  "ci_pipelines"."iid", 
  "ci_pipelines"."project_id", 
  "ci_pipelines"."sha", 
  "ci_pipelines"."ref", 
  "ci_pipelines"."status", 
  "ci_pipelines"."source", 
  "ci_pipelines"."created_at", 
  "ci_pipelines"."updated_at", 
  "ci_pipelines"."user_id"
FROM "ci_pipelines" 
WHERE "ci_pipelines"."id" IN 
(425570388, 425574061, 425576981, 425579129, 425581120, 425582878, 425585207, 425586988, 425588484, 425590221, 
425591672, 425593407, 425597674, 425599113, 425600983, 425602574, 425604061, 425605389, 425607039, 425608295)
ORDER BY "ci_pipelines"."id" DESC

The number of IDs in the clause WHERE "ci_pipelines"."id" IN (...) depends on the value of the per_page parameter, but can never be more than 20.

Time: 134.516 ms
  - planning: 7.898 ms
  - execution: 126.618 ms
    - I/O read: 125.367 ms
    - I/O write: 0.000 ms

Shared buffers:
  - hits: 80 (~640.00 KiB) from the buffer pool
  - reads: 43 (~344.00 KiB) from the OS file cache, including disk I/O
  - dirtied: 0
  - writes: 0

https://console.postgres.ai/gitlab/gitlab-production-ci/sessions/19172/commands/63311

Migration log

main: == [advisory_lock_connection] object_id: 227560, pg_backend_pid: 62112
main: == 20230531054422 AddIndexOnPackagesIdIdToPackageBuildInfos: reverting ========
main: -- transaction_open?()
main:    -> 0.0000s
main: -- view_exists?(:postgres_partitions)
main:    -> 0.1332s
main: -- indexes(:packages_build_infos)
main:    -> 0.0037s
main: -- execute("SET statement_timeout TO 0")
main:    -> 0.0003s
main: -- remove_index(:packages_build_infos, {:algorithm=>:concurrently, :name=>"index_packages_build_infos_package_id_id"})
main:    -> 0.0026s
main: -- execute("RESET statement_timeout")
main:    -> 0.0006s
main: == 20230531054422 AddIndexOnPackagesIdIdToPackageBuildInfos: reverted (0.1586s) 

main: == [advisory_lock_connection] object_id: 227560, pg_backend_pid: 62112
main: == [advisory_lock_connection] object_id: 238340, pg_backend_pid: 62418
main: == 20230531054422 AddIndexOnPackagesIdIdToPackageBuildInfos: migrating ========
main: -- transaction_open?()
main:    -> 0.0003s
main: -- view_exists?(:postgres_partitions)
main:    -> 0.0019s
main: -- index_exists?(:packages_build_infos, [:package_id, :id], {:name=>"index_packages_build_infos_package_id_id", :algorithm=>:concurrently})
main:    -> 0.0045s
main: -- execute("SET statement_timeout TO 0")
main:    -> 0.0005s
main: -- add_index(:packages_build_infos, [:package_id, :id], {:name=>"index_packages_build_infos_package_id_id", :algorithm=>:concurrently})
main:    -> 0.0032s
main: -- execute("RESET statement_timeout")
main:    -> 0.0005s
main: == 20230531054422 AddIndexOnPackagesIdIdToPackageBuildInfos: migrated (0.0236s) 

Related to #341950 (closed)

Edited by Radamanthus Batnag

Merge request reports