Draft: Add Organization Isolation queries on organization-sharded models

What does this MR do and why?

This MR will modify database queries: it will scope queries to the Current.organization.

The reason for this MR: for organization isolation, we need to scope all queries to the current organization.

There are several options for this:

  • Post processing using pg_query gem: !204542 (closed) this means that the Arel AST is changed into SQL and then parsed into a PGQuery AST, this seems redundant work
  • Rails scopes
  • Use some kind of proxy for PostgreSQL, next to or in place of pg bouncer

Example

# No Current.organization defined, so no query manipulation
[4] pry(main)> puts Project.first.organization.name
Default
# SELECT <columns> FROM "projects" ORDER BY "projects"."id" ASC LIMIT 1

# Use a new organization for Current.organization
[5] pry(main)> Current.organization = Organizations::Organization.find_or_create_by!(name: 'my org', path: 'my-org')

# This is the proof: we now want to retrieve the project that belongs to organization_id = 1. 
# But we set the Current.organization to "my org". This Project does not belong to that organization
# So for the code, it seems like the Project does not exist and it will raise an error:
[6] pry(main)> puts Project.first.organization.name
NoMethodError: undefined method 'organization' for nil
# SELECT <columns> FROM "projects" WHERE "projects"."organization_id" = 1000 ORDER BY 
 "projects"."id" ASC LIMIT 1

# We can 'move' the project to the newly created organization:
[7] pry(main)> Project.where(id: 1).update_all(organization_id: Current.organization.id)

# And now, the organization is 'my-org' and that is the Current.organization so the project is found
[8] pry(main)> puts Project.first.organization.name
my-org

# Bypass organization scoping: no change in SQL
[9] pry(main)> Gitlab::Database::Organizations::Scope.without_organization_scope{ puts Project.first.organization.name }
Default

How it works

The code is implemented as an extension to the existing ActiveRecord logic. ActiveRecord builds queries using the Arel class. The Arel logic is encapsulated in the ActiveRecord::Relation class. The ActiveRecord::Relation class has a method arel. This method returns an Arel AST. We can patch this method:

If we should add an organization scope, it will return a modified Arel AST: the FROM-part of the AST will have an additional 'WHERE organization_id = XXX' clause

If we should not add a scope, the original Arel AST is returned. This will happen when:

  • No Current.organization is set
  • When the table has no sharding key that references organizations table.
  • When the table has multiple sharding keys (Snippets for example)
  • When the logic is wrapped in a Gitlab::Database::Organizations::Scope.without_organization_scope block

UseSqlFunctionForPrimaryKeyLookups Some queries use a PostgreSQL function: this function doesn't know about organization_id. I added 3 new functions that support organization_id. #429479 (closed)

Performance

Arel AST manipulation

Impact on the application logic is minimal. Retrieving a random group 10000 times:

                                        user     system      total        real
Without current_org                45.782577   1.758297  47.540874 ( 47.645114)
With current_org                   45.544311   1.011991  46.556302 ( 46.729764)

TODO

  • Add Feature flag. Would be nice if we could enable this for organization_id > 1. So existing operation won't be affected. Only Organizations that are new.
  • Current implementation is only adding WHERE organization=X to table(s) that are in the FROM clause.
    • Less changes to our queries: saver to rollout
    • Joins are not scoped
  • CTE Support
  • Performance impact:
    • Database queries

References

Related to #582272

MR acceptance checklist

Evaluate this MR against the MR acceptance checklist. It helps you analyze changes to reduce risks in quality, performance, reliability, security, and maintainability.

Edited by Rutger Wessels

Merge request reports

Loading