Skip to content
Snippets Groups Projects

Add filter by ref to project jobs resolver

Closed Miranda Fluharty requested to merge 367294-project-jobs-resolver-filter-by-ref into master

What does this MR do and why?

For #367294

This MR modifies the parameters that you can provide when getting the list of jobs for a project via GraphQL: it adds a ref filter that should allow you to get only the jobs that ran for that ref.

Screenshots or screen recordings

Screenshots from GraphiQL

jobs query without ref jobs query with ref
Screen_Shot_2022-12-06_at_17.51.04 Screen_Shot_2022-12-06_at_17.51.35

How to set up and validate locally

  1. git checkout 367294-project-jobs-resolver-filter-by-ref
  2. find/make/import a project with a pipeline with a job that generates artifacts
you can use this CI yaml to generate some sample artifacts:
# .gitlab-ci.yml

potato:
  stage: build
  script:
    - echo 'potato' >> potato.txt
  artifacts:
    expose_as: 'potato'
    paths: ['potato.txt']

tomato:
  stage: build
  script:
    - echo 'tomato' >> tomato.txt
  artifacts:
    expose_as: 'tomato'
    paths: ['tomato.txt']

pineapple:
  stage: build
  script:
    - echo 'pineapple' >> pineapple.txt
  artifacts:
    expose_as: 'pineapple'
    paths: ['pineapple.txt']

apple:
  stage: build
  script:
    - echo 'apple' >> apple.txt
  artifacts:
    expose_as: 'apple'
    paths: ['apple.txt']

toblerone:
  stage: build
  script:
    - echo 'toblerone' >> toblerone.txt
  artifacts:
    expose_as: 'toblerone'
    paths: ['toblerone.txt']
  1. run a pipeline for one branch in that project - for this you'll need to set up a local runner for GDK or enable runners in gitpod
  2. run a pipeline for another branch
  3. using the GraphQL explorer (http://gdk.test:3000/-/graphql-explorer), query for a project's jobs by ref:
query getJobArtifacts {
  project(fullPath: "path/to/project") {  # substitute the full path of your project
    jobs(ref: "main") {                   # substitute the names of your branches
      nodes {
        id
        status
        name
        refName
      }
    }
  }
}
  1. when a ref is provided, only jobs that ran for that ref should be returned
  2. when a ref is not provided, jobs for any ref should be returned

Database details

Raw SQL

If I use this as the execute method:

    def execute
      builds = init_collection.order_id_desc
      builds = filter_by_with_artifacts(builds)
      builds = filter_by_ref(builds)
      builds = filter_by_scope(builds)
      puts builds.to_sql
      builds
    rescue Gitlab::Access::AccessDeniedError
      type.none
    end

... and run bundle exec rspec spec/finders/ci/jobs_finder_spec.rb:85 (and format the SQL it outputs with pgFormatter), then I get:

SELECT
    "ci_builds".*
FROM
    "ci_builds"
WHERE
    "ci_builds"."type" = 'Ci::Build'
    AND "ci_builds"."project_id" = 23
    AND ("ci_builds"."status" NOT IN ('created'))
    AND "ci_builds"."ref" = 'dev'
ORDER BY
    "ci_builds"."id" DESC

... and as a control, if I comment out builds = filter_by_ref(builds) in the execute method and run it again, I get:

SELECT
    "ci_builds".*
FROM
    "ci_builds"
WHERE
    "ci_builds"."type" = 'Ci::Build'
    AND "ci_builds"."project_id" = 29
    AND ("ci_builds"."status" NOT IN ('created'))
ORDER BY
    "ci_builds"."id" DESC

Query plans

Adding in the gitlab-org/gitlab project id (278964) and the ref name master for testing gives us these two (with and without "ci_builds"."ref" = 'master'):

query link to plan
SELECT "ci_builds".* FROM "ci_builds" WHERE "ci_builds"."type" = 'Ci::Build' AND "ci_builds"."project_id" = 278964 AND ("ci_builds"."status" NOT IN ('created')) AND "ci_builds"."ref" = 'master' ORDER BY "ci_builds"."id" DESC https://console.postgres.ai/gitlab/gitlab-production-tunnel-pg12/sessions/13900/commands/48651
SELECT "ci_builds".* FROM "ci_builds" WHERE "ci_builds"."type" = 'Ci::Build' AND "ci_builds"."project_id" = 278964 AND ("ci_builds"."status" NOT IN ('created')) ORDER BY "ci_builds"."id" DESC https://console.postgres.ai/gitlab/gitlab-production-tunnel-pg12/sessions/13900/commands/48652

MR acceptance checklist

This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.

Edited by Miranda Fluharty

Merge request reports

Loading
Loading

Activity

Filter activity
  • Approvals
  • Assignees & reviewers
  • Comments (from bots)
  • Comments (from users)
  • Commits & branches
  • Edits
  • Labels
  • Lock status
  • Mentions
  • Merge request status
  • Tracking
  • 72 73 end
    73 74 end
    74 75
    76 # rubocop: disable CodeReuse/ActiveRecord
    77 def filter_by_ref(builds)
    78 if params[:ref].present?
    79 builds.where(ref: params[:ref])
    80 else
    81 builds
    82 end
    83 end
    84 # rubocop: enable CodeReuse/ActiveRecord
    85
  • Miranda Fluharty added 2 commits

    added 2 commits

    • d599fabf - Set ref on job to fix test
    • 5df4e71e - Move where clause to a with_ref scope

    Compare with previous version

  • added 1 commit

    • e71e4e6d - Apply 1 suggestion(s) to 1 file(s)

    Compare with previous version

  • Squashing/rebasing this and adding a changelog entry now that it's working... :hourglass_flowing_sand:

  • Miranda Fluharty added 489 commits

    added 489 commits

    Compare with previous version

  • Miranda Fluharty marked the checklist item I have evaluated the MR acceptance checklist for this MR. as completed

    marked the checklist item I have evaluated the MR acceptance checklist for this MR. as completed

  • Miranda Fluharty marked this merge request as ready

    marked this merge request as ready

  • added workflowin review label and removed workflowin dev label

  • Miranda Fluharty requested review from @l.rosa

    requested review from @l.rosa

  • requested review from @alinamihaila

  • Miranda Fluharty mentioned in merge request !106499 (closed)

    mentioned in merge request !106499 (closed)

  • mentioned in issue #367294

  • Alina Mihaila requested review from @dbalexandre and removed review request for @alinamihaila

    requested review from @dbalexandre and removed review request for @alinamihaila

  • Leonardo da Rosa
    • Resolved by Leonardo da Rosa

      I liked the overall approach. I left some comments on the MR that follows the DB review guidelines.

      PS: We could split the Ci::JobsFinder into separated classes, maybe using the strategy pattern (based on each type of filtering), but that is out of the scope of this MR.

      Edited by Leonardo da Rosa
  • Leonardo da Rosa removed review request for @l.rosa

    removed review request for @l.rosa

  • removed review request for @dbalexandre

  • Miranda Fluharty changed the description

    changed the description

  • Miranda Fluharty added 798 commits

    added 798 commits

    Compare with previous version

  • Miranda Fluharty requested review from @l.rosa

    requested review from @l.rosa

  • Leonardo da Rosa approved this merge request

    approved this merge request

  • :wave: @l.rosa, thanks for approving this merge request.

    This is the first time the merge request is approved. To ensure full test coverage, a new pipeline will be started shortly.

    For more info, please refer to the following links:

  • Leonardo da Rosa removed review request for @l.rosa

    removed review request for @l.rosa

  • Leonardo da Rosa requested review from @krasio

    requested review from @krasio

    • @mfluharty code looks pretty good!

      It follows the DB guidelines and the queries performance are excelent.

      Raw SQL Query Plans Code Tests
      !104191 (closed) !104191 (closed) :white_check_mark: :white_check_mark:

      /cc @krasio I'm assigning you to do the final approval here ~"database::approved"

    • @mfluharty @l.rosa Seems like this is not the final SQL that will be generated, as the GraphQL API should add pagination, and we do not have LIMIT in this queries. To get the actual SQL it's usually better to execute a request against our local app, and then find the query in the logs. Once we have it, we can replace parameters like project_id and check the plan in DB Lab.

      As is, the above queries will load all rows matching the filters, which is probably not what we want, and may hit the timeout if there are too many, or it's too slow to find them. This is not obvious from the execution plans we have, as they return 0 rows, and the reason for this is that these plans are from the main database (named gitlab-production-tunnel-pg12 in postgres.ai), and not ci (which is called gitlab-production-ci here). CI related tables on main were truncated, hence we get no results.

      Can you please try again with the actual SQL and using the ci database, and go through review with Leo once again. Assign back for maintainer review when ready.

    • Nice catch @krasio! @mfluharty could you please, take a look?

    • Please register or sign in to reply
  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
  • Please register or sign in to reply
    Loading