Skip to content

Reduce cached SQL for JobsController#show

What does this MR do?

What is the issue to solve

As listed in epic: &3873 (closed), we detected several end-points that have a big amount of CACHE SQL.

Projects::JobsController#show is one of them.

json.meta.caller_id.keyword: Descending Count Max json.db_cached_count 50th percentile of json.db_count 95th percentile of json.db_count 50th percentile of json.db_cached_count 95th percentile of json.db_cached_count Average json.db_duration_s
Projects::JobsController#show 4523052 1826 64.9999497601192 114.371385251551 0 0 0.0929273990589387

As mentioned in &3873 (closed):

The cached queries help with reducing DB load, but they still:

  • consume memory
  • require as to re-instantiate each AR object
  • require as to re-instantiate each relation of the object
  • makes us spend additional CPU-cycles to look into a list of cached queries.

We should treat the CACHE the same as N+1 queries. They are cheaper, but they are not cheap at all from ~memory perspective.

What could cause the cached SQL in JobsController#show

JobsController#show represents format.json with BuildDetailsEntity https://gitlab.com/gitlab-org/gitlab/-/blob/master/app/controllers/projects/jobs_controller.rb#L36-38:

        render json: BuildSerializer
          .new(project: @project, current_user: @current_user)
          .represent(@build, {}, BuildDetailsEntity)

This goes through several steps:

Finally, it calls build.project.namespace at https://gitlab.com/gitlab-org/gitlab/-/blob/master/app/serializers/job_entity.rb#L71

  def path_to(route, build, params = {})
    send("#{route}_path", build.project.namespace, build.project, build, params) # rubocop:disable GitlabSecurity/PublicSend
  end

When there are many failed_builds for the pipeline, each failed build needs to:

  • query build.project and build.project.namespace from DB --- this result in many cached SQL since all these builds belong to the same project
  • create Project and Namespace AR object --- this waste memory resource and waste CPU time to allocate/GC the memory

The solution

From PipelineEntity(https://gitlab.com/gitlab-org/gitlab/-/blob/master/app/serializers/pipeline_entity.rb#L84-86), we know all pipeline failed_builds belong to the same pipeline, thus they also belong to the same project pipeline.project

  expose :failed_builds, if: -> (*) { can_retry? }, using: JobEntity do |pipeline|
    pipeline.failed_builds
  end

So we can assign pipeline.project to each failed_build.project. This will avoid the cached SQL when JobEntity needs to access project information.

Test result

The test shows both execution time and memory usage are improved

When a pipeline has 595 failed builds:

  • Master branch
    • Total allocated: 1068788234 bytes (11577122 objects)
    • Total retained: 63519546 bytes (324322 objects)
    • duration_s: 203.8628
    • db_count: 1570
    • db_cached_count: 1533
  • Feature branch
    • Total allocated: 865909587 bytes (9588100 objects), 202MB(19%) less
    • Total retained: 53462974 bytes (276768 objects), 10MB(15.8%) less
    • duration_s: 106.43789, 97.42491(47.8%) less
    • db_count: 48, 1522(97%) less
    • db_cached_count: 12, 1521(99%) less

Does this MR meet the acceptance criteria?

Conformity

Availability and Testing

Security

If this MR contains changes to processing or storing of credentials or tokens, authorization and authentication methods and other items described in the security review guidelines:

  • Label as security and @ mention @gitlab-com/gl-security/appsec
  • The MR includes necessary changes to maintain consistency between UI, API, email, or other methods
  • Security reports checked/validated by a reviewer from the AppSec team

Closes #231467 (closed)

Edited by Qingyu Zhao

Merge request reports