Provide a patch for Rails 4.2 to fix `preload` with `from`

The patch could be found in https://gitlab.com/gitlab-org/gitlab-ce/issues/43710#note_61431217

We need to get this fixed in upstream Rails, providing a test and check if the same should be fixed in Rails 5+

Original title

Debug Problem with Rails preload not using defined scopes

Original description

We have a problem where the preload AR method isn't using the scope code defined in the association. It's very mysterious and we believe it's an ActiveRecord bug.

Will fill in more details here in a moment.

This is blocking MR https://gitlab.com/gitlab-org/gitlab-ce/merge_requests/17196 which is to fix https://gitlab.com/gitlab-org/gitlab-ce/issues/41057

The bulk of the investigation is here:

https://gitlab.com/gitlab-org/gitlab-ce/merge_requests/17196#note_59630600

To reproduce the problem, start with that branch. In the Rails console use the artifacts scope which calls latest.with_artifacts_not_expired. The latest method comes from... I don't know where. But the with_artifacts_not_expired comes from Ci::Build which is the newly added code in this MR. And it calls the code as expected:

[23] pry(main)> print Ci::Pipeline.first.artifacts.all.to_sql
  [1m[35mCi::Pipeline Load (1.4ms)[0m  SELECT  "ci_pipelines".* FROM "ci_pipelines"  ORDER BY "ci_pipelines"."id" ASC LIMIT 1
artifacts NOT expired
SELECT "ci_builds".* FROM (SELECT "ci_builds".* FROM "ci_builds" WHERE "ci_builds"."type" IN ('Ci::Build') AND (artifacts_file <> '' AND (artifacts_expire_at IS NULL OR artifacts_expire_at > '2018-03-01 16:34:32.153719'))
UNION ALL
SELECT "ci_builds".* FROM "ci_builds" WHERE "ci_builds"."type" IN ('Ci::Build') AND ((artifacts_file IS NULL OR artifacts_file = '') AND EXISTS (SELECT 1 FROM "ci_job_artifacts" WHERE (ci_builds.id = ci_job_artifacts.job_id AND (expire_at IS NULL OR expire_at > '2018-03-01 16:34:32.154269'))))) AS ci_builds WHERE "ci_builds"."type" IN ('Ci::Build') AND "ci_builds"."commit_id" = 1 AND "ci_builds"."type" IN ('Ci::Build') AND ("ci_builds"."retried" = 'f' OR "ci_builds"."retried" IS NULL)=> nil

However other places use preload to load the artifacts and I believe it's intended to call this same scope, Ci::Pipeline.artifacts however it seems to be calling something different:

[24] pry(main)> print Ci::Pipeline.preload([:artifacts]).first
  [1m[36mCi::Pipeline Load (2.0ms)[0m  [1mSELECT  "ci_pipelines".* FROM "ci_pipelines"  ORDER BY "ci_pipelines"."id" ASC LIMIT 1[0m
artifacts NOT expired
  [1m[35mCi::Build Load (1.8ms)[0m  SELECT "ci_builds".* FROM "ci_builds" WHERE "ci_builds"."type" IN ('Ci::Build') AND ("ci_builds"."retried" = 'f' OR "ci_builds"."retried" IS NULL) AND "ci_builds"."commit_id" IN (1)
  [1m[36mProject Load (1.4ms)[0m  [1mSELECT "projects".* FROM "projects" WHERE "projects"."id" IN (2)[0m
#<Ci::Pipeline:0x0000555b0dc20e58>=> nil

With the scopes in Ci::Build as it is in master (and no other changes, just with_artifacts_not_expired, with_artifacts, and with_expired_artifacts restored) then this is what happens, note the second query is the query in this method:

[29] pry(main)> print Ci::Pipeline.preload([:artifacts]).first
  [1m[35mCi::Pipeline Load (1.6ms)[0m  SELECT  "ci_pipelines".* FROM "ci_pipelines"  ORDER BY "ci_pipelines"."id" ASC LIMIT 1
  [1m[36mCi::Build Load (62.5ms)[0m  [1mSELECT "ci_builds".* FROM "ci_builds" WHERE "ci_builds"."type" IN ('Ci::Build') AND ("ci_builds"."retried" = 'f' OR "ci_builds"."retried" IS NULL) AND ((artifacts_file IS NOT NULL AND artifacts_file <> '') OR EXISTS (SELECT 1 FROM "ci_job_artifacts" WHERE (ci_builds.id = ci_job_artifacts.job_id))) AND (artifacts_expire_at IS NULL OR artifacts_expire_at > '2018-03-01 16:42:42.411165') AND "ci_builds"."commit_id" IN (1)[0m
  [1m[35mProject Load (0.5ms)[0m  SELECT "projects".* FROM "projects" WHERE "projects"."id" IN (2)
#<Ci::Pipeline:0x0000555b08c56468>=> nil

Just to clarify, to get this trace I just commented out those three methods in app/models/ci/build.rb and added these lines which come straight from master:

    scope :with_artifacts, ->() do
      where('(artifacts_file IS NOT NULL AND artifacts_file <> ?) OR EXISTS (?)',
        '', Ci::JobArtifact.select(1).where('ci_builds.id = ci_job_artifacts.job_id'))
    end
    scope :with_artifacts_not_expired, ->() { with_artifacts.where('artifacts_expire_at IS NULL OR artifacts_expire_at > ?', Time.now) }
    scope :with_expired_artifacts, ->() { with_artifacts.where('artifacts_expire_at < ?', Time.now) }

And the problematic code in the MR is:

    scope :with_artifacts_not_expired, ->() do
      print "artifacts NOT expired\n"
      old = unscoped.where(%q[artifacts_file <> '' AND (artifacts_expire_at IS NULL OR artifacts_expire_at > ?)], Time.now)
      new = unscoped.where(%q[(artifacts_file IS NULL OR artifacts_file = '') AND EXISTS (?)],
                           Ci::JobArtifact.select(1).where('ci_builds.id = ci_job_artifacts.job_id AND (expire_at IS NULL OR expire_at > ?)', Time.now))
      self.from("(#{Gitlab::SQL::Union.new([old, new], remove_duplicates: false).to_sql}) AS ci_builds") # rubocop:disable GitlabSecurity/SqlInjection
    end
Edited Mar 28, 2018 by Lin Jen-Shin
Assignee Loading
Time tracking Loading