Fix Ci::Minutes::ProjectMonthlyUsage.for_namespace_monthly_usage doing cross-joins
Seems to be introduced in !66635 (merged)
Ci::Minutes::ProjectMonthlyUsage.for_namespace_monthly_usage is executing a cross-DB query.
Database::PreventCrossJoins::CrossJoinAcrossUnsupportedTablesError (Unsupported cross-join across 'ci_project_monthly_usages, projects' querying 'gitlab_ci, gitlab_main' discovered when executing query 'SELECT "ci_project_monthly_usages".* FROM "ci_project_monthly_usages" WHERE "ci_project_monthly_usages"."date" = '2021-06-01' AND "ci_project_monthly_usages"."project_id" IN (SELECT "projects"."id" FROM "projects" WHERE "projects"."namespace_id" = 218) ORDER BY "ci_project_monthly_usages"."id" DESC LIMIT 100 /*application:test,correlation_id:01FJB8F7DQZGMWXD5STPKVZMFA,endpoint_id:GraphqlController#execute,db_config_name:main*/'. Please refer to https://docs.gitlab.com/ee/development/database/multiple_databases.html#removing-joins-between-ci_-and-non-ci_-tables for details on how to resolve this exception.):
You can see this query in action by running the spec ee/spec/features/ci/ci_minutes_spec.rb:23
The scope in question:
14 scope :for_namespace_monthly_usage, -> (namespace_monthly_usage) do
15 where(
16 date: namespace_monthly_usage.date,
17 project: namespace_monthly_usage.namespace.projects
18 ).allow_cross_joins_across_databases(url: 'https://gitlab.com/gitlab-org/gitlab/-/issues/343301')
19 end
Possible solutions
Take a look at the solution being worked on !72033 (closed) . Probably those tables can be used to accomplish this same join in the CI database.
I believe the query should actually be
project: namespace_monthly_usage.namespace.all_projects # include projects from nested namespaces
Which would return even more results. To display the breakdown of usage for a root namespace we need to pull all projects under it.
I'm wondering whether we could rewrite the Types::Ci::Minutes::NamespaceMonthlyUsageType#projects
the other way around:
- from namespace list all projects in it (kind of what
NamespaceType#projects
does) as a typeProjectMonthlyUsageType
. - Use
Batchloader::GraphQL
to get the usage data fromCi::Minutes::ProjectMonthlyUsage
for eachproject_id
anddate
. - With the default pagination we would query for 100
project_ids
at the time.
side notes: I'm also noticing that:
- both
Ci::Minutes::ProjectMonthlyUsageType
andCi::Minutes::NamespaceMonthlyUsageType
don't authorize the results, which we definitely should. -
Ci::Minutes::ProjectMonthlyUsageType#name
may cause N+1. I believe that the rewriting suggested above can eliminate that.
Edited by James Heimbuck