Skip to content

Replace API order by created_at with order by id

Alex Pooley requested to merge ap-fix-list-projects-api into master

In #334667 (comment 632358325) we found a simple API query of https://gitlab.com/api/v4/projects?membership=true&page=1&per_page=10 would result in a 500 Internal Server Error. The generated query required minutes to complete.

The problem with the query was the order by created_at portion:

with `order by created_at`
SELECT 
  MAX(
    "project_authorizations"."access_level"
  ) AS maximum_access_level, 
  "project_authorizations"."project_id" AS project_authorizations_project_id 
FROM 
  "project_authorizations" 
WHERE 
  "project_authorizations"."user_id" = 8513236 
  AND "project_authorizations"."project_id" IN (
    SELECT 
      "projects"."id" 
    FROM 
      "projects" 
      INNER JOIN "project_authorizations" ON "projects"."id" = "project_authorizations"."project_id" 
    WHERE 
      "project_authorizations"."user_id" = 8513236 
      AND "projects"."pending_delete" = FALSE 
      AND "projects"."hidden" = FALSE 
    ORDER BY 
      "projects"."created_at" DESC, 
      "projects"."id" DESC 
    LIMIT 
      10 OFFSET 0
  ) 
GROUP BY 
  "project_authorizations"."project_id"

Time: 52.816 s

  • planning: 1.196 ms
  • execution: 52.815 s
    • I/O read: 2.209 min
    • I/O write: 9.693 ms

Shared buffers:

  • hits: 3522368 (~26.90 GiB) from the buffer pool
  • reads: 588581 (~4.50 GiB) from the OS file cache, including disk I/O
  • dirtied: 5178 (~40.50 MiB)
  • writes: 481 (~3.80 MiB)

Details and visualization: https://postgres.ai/console/gitlab/gitlab-production-tunnel-pg12/sessions/9860/commands/34915.

without `order by created_at`
SELECT 
  MAX(
    "project_authorizations"."access_level"
  ) AS maximum_access_level, 
  "project_authorizations"."project_id" AS project_authorizations_project_id 
FROM 
  "project_authorizations" 
WHERE 
  "project_authorizations"."user_id" = 8513236 
  AND "project_authorizations"."project_id" IN (
    SELECT 
      "projects"."id" 
    FROM 
      "projects" 
      INNER JOIN "project_authorizations" ON "projects"."id" = "project_authorizations"."project_id" 
    WHERE 
      "project_authorizations"."user_id" = 8513236 
      AND "projects"."pending_delete" = FALSE 
      AND "projects"."hidden" = FALSE 
    ORDER BY 
      "projects"."id" DESC 
    LIMIT 
      10 OFFSET 0
  ) 
GROUP BY 
  "project_authorizations"."project_id"

Time: 35.900 ms

  • planning: 4.964 ms
  • execution: 30.936 ms
    • I/O read: 30.439 ms
    • I/O write: 0.000 ms

Shared buffers:

  • hits: 75 (~600.00 KiB) from the buffer pool
  • reads: 42 (~336.00 KiB) from the OS file cache, including disk I/O
  • dirtied: 1 (~8.00 KiB)
  • writes: 0

Details and visualization: https://postgres.ai/console/gitlab/gitlab-production-tunnel-pg12/sessions/9860/commands/34914.

Other endpoints

Merge Requests

'GET /merge_requests' 'when unauthenticated'

New

Time: 16.720 ms  
  - planning: 3.980 ms  
  - execution: 12.740 ms  
    - I/O read: 0.000 ms  
    - I/O write: 0.000 ms  
  
Shared buffers:  
  - hits: 6501 (~50.80 MiB) from the buffer pool  
  - reads: 0 from the OS file cache, including disk I/O  
  - dirtied: 0  
  - writes: 0  

Details and visualization: https://postgres.ai/console/gitlab/gitlab-production-tunnel-pg12/sessions/10030/commands/35533.

Old

Did not finish?!

SELECT 
  "merge_requests".* 
FROM 
  "merge_requests" 
  INNER JOIN "projects" ON "projects"."id" = "merge_requests"."target_project_id" 
  LEFT JOIN project_features ON projects.id = project_features.project_id 
WHERE 
  (
    EXISTS (
      SELECT 
        1 
      FROM 
        "project_authorizations" 
      WHERE 
        "project_authorizations"."user_id" = 4092171 
        AND (
          project_authorizations.project_id = projects.id
        ) 
        AND (
          project_authorizations.access_level >= 20
        )
    ) 
    OR projects.visibility_level IN (10, 20)
  ) 
  AND (
    "project_features"."merge_requests_access_level" IS NULL 
    OR "project_features"."merge_requests_access_level" IN (20, 30) 
    OR (
      "project_features"."merge_requests_access_level" = 10 
      AND EXISTS (
        SELECT 
          1 
        FROM 
          "project_authorizations" 
        WHERE 
          "project_authorizations"."user_id" = 4092171 
          AND (
            project_authorizations.project_id = project_features.project_id
          ) 
          AND (
            project_authorizations.access_level >= 20
          )
      )
    )
  ) 
ORDER BY 
  "merge_requests"."created_at" DESC, 
  "merge_requests"."id" DESC 
LIMIT 
  20 OFFSET 0
Notes

"when noteable is an Issue" "GET /projects/:id/noteable/:noteable_id/notes" "current user can view the note"

There are also endpoints for snippets and merge requests which I haven't profiled.

New

Time: 0.376 ms  
  - planning: 0.270 ms  
  - execution: 0.106 ms  
    - I/O read: 0.000 ms  
    - I/O write: 0.000 ms  
  
Shared buffers:  
  - hits: 8 (~64.00 KiB) from the buffer pool  
  - reads: 0 from the OS file cache, including disk I/O  
  - dirtied: 0  
  - writes: 0  

Details and visualization: https://postgres.ai/console/gitlab/gitlab-production-tunnel-pg12/sessions/10030/commands/35548.

Old

Time: 29.192 ms  
  - planning: 13.971 ms  
  - execution: 15.221 ms  
    - I/O read: 14.763 ms  
    - I/O write: 0.000 ms  
  
Shared buffers:  
  - hits: 3 (~24.00 KiB) from the buffer pool  
  - reads: 5 (~40.00 KiB) from the OS file cache, including disk I/O  
  - dirtied: 0  
  - writes: 0  

Details and visualization: https://postgres.ai/console/gitlab/gitlab-production-tunnel-pg12/sessions/10030/commands/35546.

Users

Amended spec with created_at instead of id: 'GET /users' "when authenticated" 'when authenticate as a regular user' 'returns the correct order when sorted by id'

New

Time: 13.760 ms  
  - planning: 0.473 ms  
  - execution: 13.287 ms  
    - I/O read: 12.952 ms  
    - I/O write: 0.000 ms  
  
Shared buffers:  
  - hits: 21 (~168.00 KiB) from the buffer pool  
  - reads: 2 (~16.00 KiB) from the OS file cache, including disk I/O  
  - dirtied: 0  
  - writes: 0  

Details and visualization: https://postgres.ai/console/gitlab/gitlab-production-tunnel-pg12/sessions/10030/commands/35561.

Old

Did not finish?!

SELECT 
  "users".* 
FROM 
  "users" 
ORDER BY 
  "users"."created_at" ASC, 
  "users"."id" ASC 
LIMIT 
  20 OFFSET 0

When I remove the order by id so we only order by created_at the query finishes fine.

Time: 53.867 ms  
  - planning: 0.412 ms  
  - execution: 53.455 ms  
    - I/O read: 53.007 ms  
    - I/O write: 0.000 ms  
  
Shared buffers:  
  - hits: 0 from the buffer pool  
  - reads: 23 (~184.00 KiB) from the OS file cache, including disk I/O  
  - dirtied: 1 (~8.00 KiB)  
  - writes: 0  

Details and visualization: https://postgres.ai/console/gitlab/gitlab-production-tunnel-pg12/sessions/10030/commands/35560.


A further complication is that the projects API endpoint default sorts by created_at and the code then adds a further redundant sort of id.

There is a recommendation from the database team to sort by id rather than created_at.

In this MR we replace any API sort by created_at to a sort by id.

Closes #334667 (closed), #361607 (closed)

Edited by Alex Pooley

Merge request reports