Skip to content

Fix Ci::Minutes::ProjectMonthlyUsage.for_namespace_monthly_usage doing cross-joins

Seems to be introduced in !66635 (merged)

Ci::Minutes::ProjectMonthlyUsage.for_namespace_monthly_usage is executing a cross-DB query.

Database::PreventCrossJoins::CrossJoinAcrossUnsupportedTablesError (Unsupported cross-join across 'ci_project_monthly_usages, projects' querying 'gitlab_ci, gitlab_main' discovered when executing query 'SELECT "ci_project_monthly_usages".* FROM "ci_project_monthly_usages" WHERE "ci_project_monthly_usages"."date" = '2021-06-01' AND "ci_project_monthly_usages"."project_id" IN (SELECT "projects"."id" FROM "projects" WHERE "projects"."namespace_id" = 218) ORDER BY "ci_project_monthly_usages"."id" DESC LIMIT 100 /*application:test,correlation_id:01FJB8F7DQZGMWXD5STPKVZMFA,endpoint_id:GraphqlController#execute,db_config_name:main*/'. Please refer to https://docs.gitlab.com/ee/development/database/multiple_databases.html#removing-joins-between-ci_-and-non-ci_-tables for details on how to resolve this exception.):

You can see this query in action by running the spec ee/spec/features/ci/ci_minutes_spec.rb:23

The scope in question:

 14       scope :for_namespace_monthly_usage, -> (namespace_monthly_usage) do                                                                              
 15         where(                                                                                                                                         
 16           date: namespace_monthly_usage.date,                                                                                                          
 17           project: namespace_monthly_usage.namespace.projects                                                                                          
 18         ).allow_cross_joins_across_databases(url: 'https://gitlab.com/gitlab-org/gitlab/-/issues/343301')                                              
 19       end                                                                                                                                              

Possible solutions

https://docs.gitlab.com/ee/development/database/multiple_databases.html#removing-joins-between-ci_-and-non-ci_-tables

Take a look at the solution being worked on !72033 (closed) . Probably those tables can be used to accomplish this same join in the CI database.

I believe the query should actually be

project: namespace_monthly_usage.namespace.all_projects # include projects from nested namespaces                                                                                         

Which would return even more results. To display the breakdown of usage for a root namespace we need to pull all projects under it.

I'm wondering whether we could rewrite the Types::Ci::Minutes::NamespaceMonthlyUsageType#projects the other way around:

  • from namespace list all projects in it (kind of what NamespaceType#projects does) as a type ProjectMonthlyUsageType.
  • Use Batchloader::GraphQL to get the usage data from Ci::Minutes::ProjectMonthlyUsage for each project_id and date.
  • With the default pagination we would query for 100 project_ids at the time.

side notes: I'm also noticing that:

  • both Ci::Minutes::ProjectMonthlyUsageType and Ci::Minutes::NamespaceMonthlyUsageType don't authorize the results, which we definitely should.
  • Ci::Minutes::ProjectMonthlyUsageType#name may cause N+1. I believe that the rewriting suggested above can eliminate that.
Edited by James Heimbuck