Skip to content

Geo: Avoid PG statement timeout during `Geo::MetricsUpdateWorker`

Michael Kozono requested to merge mk/avoid-bad-query-plan into master

What does this MR do and why?

Avoids a bad MIN query plan during batch count.

This actually completely cuts out the MIN query logic during batch count. So batch counts will sometimes take longer overall than they otherwise should, since more batches will be counted. But query timeouts due to PG selecting a bad query plan on the MIN query will be completely avoided.

This is a two-way door decision: We can easily revert this if someone finds a better solution.

Resolves #442259 (closed)

MR acceptance checklist

Please evaluate this MR against the MR acceptance checklist. It helps you analyze changes to reduce risks in quality, performance, reliability, security, and maintainability.

How to set up and validate locally

Numbered steps to set up and validate the change are strongly suggested.

  1. Set up Geo if you haven't already
  2. Visit Admin > Geo > Sites
  3. The page should regularly update the status of all Geo sites

No behavior change is expected. I was unable to reproduce the bad query plan selection locally.

I confirmed in Rails console on the secondary Geo site that the MIN query is avoided with this change:

Before

[3] pry(main)> Geo::JobArtifactReplicator.verified_count
  Geo::JobArtifactRegistry Minimum (1.8ms)  SELECT MIN("job_artifact_registry"."id") FROM "job_artifact_registry" WHERE ("job_artifact_registry"."state" IN (2)) AND ("job_artifact_registry"."verification_state" IN (2)) /*application:console,db_config_name:geo,console_hostname:MikesGitLabMBP.localdomain,console_username:mkozonogitlab,line:/lib/gitlab/database/batch_counter.rb:123:in `actual_start'*/
  Geo::JobArtifactRegistry Maximum (0.2ms)  SELECT MAX("job_artifact_registry"."id") FROM "job_artifact_registry" WHERE ("job_artifact_registry"."state" IN (2)) AND ("job_artifact_registry"."verification_state" IN (2)) /*application:console,db_config_name:geo,console_hostname:MikesGitLabMBP.localdomain,console_username:mkozonogitlab,line:/lib/gitlab/database/batch_counter.rb:127:in `actual_finish'*/
  Geo::JobArtifactRegistry Count (382.3ms)  SELECT COUNT("job_artifact_registry"."id") FROM "job_artifact_registry" WHERE ("job_artifact_registry"."state" IN (2)) AND ("job_artifact_registry"."verification_state" IN (2)) AND "job_artifact_registry"."id" >= $1 AND "job_artifact_registry"."id" < $2 /*application:console,db_config_name:geo,console_hostname:MikesGitLabMBP.localdomain,console_username:mkozonogitlab,line:/lib/gitlab/database/batch_counter.rb:72:in `count_with_timeout'*/  [["id", 1102], ["id", 38501]]
=> 31842

After

[5] pry(main)> Geo::JobArtifactReplicator.verified_count
  Geo::JobArtifactRegistry Maximum (0.5ms)  SELECT MAX("job_artifact_registry"."id") FROM "job_artifact_registry" WHERE ("job_artifact_registry"."state" IN (2)) AND ("job_artifact_registry"."verification_state" IN (2)) /*application:console,db_config_name:geo,console_hostname:MikesGitLabMBP.localdomain,console_username:mkozonogitlab,line:/lib/gitlab/database/batch_counter.rb:127:in `actual_finish'*/
  Geo::JobArtifactRegistry Count (178.8ms)  SELECT COUNT("job_artifact_registry"."id") FROM "job_artifact_registry" WHERE ("job_artifact_registry"."state" IN (2)) AND ("job_artifact_registry"."verification_state" IN (2)) AND "job_artifact_registry"."id" >= $1 AND "job_artifact_registry"."id" < $2 /*application:console,db_config_name:geo,console_hostname:MikesGitLabMBP.localdomain,console_username:mkozonogitlab,line:/lib/gitlab/database/batch_counter.rb:72:in `count_with_timeout'*/  [["id", 0], ["id", 38501]]
=> 30737
Edited by Michael Kozono

Merge request reports