Skip to content

Split SCIM tables

Relates to #498905 (closed)

Summary

This MR introduces separate tables for Group SCIM-related data. It creates new models GroupScimAuthAccessToken and GroupScimIdentity to replace the existing ScimOauthAccessToken and ScimIdentity models for group-level SCIM operations. This change aims to improve performance and scalability for Group SCIM operations in GitLab.com.

Changes overview

  • Created new models: GroupScimAuthAccessToken and GroupScimIdentity
  • Updated Group SCIM-related controllers and API endpoints to use the new models
  • Added migrations to create new tables and move existing data
  • Updated finders and services to work with the new models
  • Implemented feature flag separate_group_scim_table to control the rollout

Validation steps

Setup Group SAML and Group SCIM locally

Before checking out this branch

  1. Start your GDK instance
  2. Select a group and add SCIM and SAML settings
  3. Make sure the SCIM is working as expected

After checking out this branch

  1. Run database migrations
  2. Enable feature flag Feature.enable(:separate_group_scim_table, Group.find(<group-id>))
  3. Make sure that previous SCIM calls are still working with old scim token and URI
  4. Setup scim in another group
  5. confirm that everything works as expected

Script for testing scim


require 'bundler/inline'

gemfile do
  source 'https://rubygems.org'
  gem 'json'
  gem 'http'
  gem 'faker'
end

require 'http'
require 'json'
require 'faker'

# Replace with your SCIM endpoint and token
SCIM_ENDPOINT = 'https://gdk.test:3000/api/scim/v2/groups/twitter/Users'
SCIM_TOKEN = 'glsoat--N82VshzyPDJ6YFhCxKi' # (GDK safe secret)

scim_user_id = Faker::Internet.uuid
# Function to generate random user data
def generate_random_user_data(scim_user_id)
  {
    "schemas" => ["urn:ietf:params:scim:schemas:core:2.0.0"],
    "externalId" => scim_user_id,
    "userName" => Faker::Internet.user_name,
    "name" => {
      "givenName" => Faker::Name.first_name,
      "familyName" => Faker::Name.last_name
    },
    "emails" => [
      {
        "value" => Faker::Internet.email,
        "type" => "work"
      }
    ],
    "active" => true
  }
end

# Create a new user
def create_user(user_data)
  http = HTTP.auth("Bearer #{SCIM_TOKEN}")
  response = http.post(SCIM_ENDPOINT, json: user_data, headers: { 'Content-Type' => 'application/scim+json' })
  handle_response(response)
end

# Edit an existing user
def edit_user(user_id, updated_data)
  user_uri = "#{SCIM_ENDPOINT}/#{user_id}"
  http = HTTP.auth("Bearer #{SCIM_TOKEN}")
  response = http.patch(user_uri, json: updated_data, headers: { 'Content-Type' => 'application/scim+json' })
  handle_response(response)
end

# Delete a user
def delete_user(user_id)
  user_uri = "#{SCIM_ENDPOINT}/#{user_id}"
  http = HTTP.auth("Bearer #{SCIM_TOKEN}")
  response = http.delete(user_uri)
  handle_response(response)
end

# Handle the response
def handle_response(response)
  if response.code == 201 || response.code == 200 || response.code == 204
    puts "Operation successful!"
    puts response.body
  else
    puts "Error:"
    puts response.body
    puts response.code
  end
end

# Example usage:
# Create a new user with random data
new_user_data = generate_random_user_data(scim_user_id)
create_user(new_user_data)

# Edit an existing user (assuming you have the user ID)
user_id = scim_user_id  
updated_data = {
  "emails" => [
    {
      "value" => "updated_email@example.com",
      "type" => "work"
    }
  ]
}
edit_user(user_id, updated_data)

# Delete a user
delete_user(user_id)

Queries and SQL plans

  1. user.groups.top_level.with_scim_identity

Query

SELECT "namespaces"."id", "namespaces"."name", "namespaces"."path", "namespaces"."owner_id", "namespaces"."created_at", "namespaces"."updated_at", "namespaces"."type", "namespaces"."description", "namespaces"."avatar", "namespaces"."membership_lock", "namespaces"."share_with_group_lock", "namespaces"."visibility_level", "namespaces"."request_access_enabled", "namespaces"."ldap_sync_status", "namespaces"."ldap_sync_error", "namespaces"."ldap_sync_last_update_at", "namespaces"."ldap_sync_last_successful_update_at", "namespaces"."ldap_sync_last_sync_at", "namespaces"."description_html", "namespaces"."lfs_enabled", "namespaces"."parent_id", "namespaces"."shared_runners_minutes_limit", "namespaces"."repository_size_limit", "namespaces"."require_two_factor_authentication", "namespaces"."two_factor_grace_period", "namespaces"."cached_markdown_version", "namespaces"."project_creation_level", "namespaces"."runners_token", "namespaces"."file_template_project_id", "namespaces"."saml_discovery_token", "namespaces"."runners_token_encrypted", "namespaces"."custom_project_templates_group_id", "namespaces"."auto_devops_enabled", "namespaces"."extra_shared_runners_minutes_limit", "namespaces"."last_ci_minutes_notification_at", "namespaces"."last_ci_minutes_usage_notification_level", "namespaces"."subgroup_creation_level", "namespaces"."max_pages_size", "namespaces"."max_artifacts_size", "namespaces"."mentions_disabled", "namespaces"."default_branch_protection", "namespaces"."max_personal_access_token_lifetime", "namespaces"."push_rule_id", "namespaces"."shared_runners_enabled", "namespaces"."allow_descendants_override_disabled_shared_runners", "namespaces"."traversal_ids", "namespaces"."organization_id" FROM "namespaces" INNER JOIN "members" ON "namespaces"."id" = "members"."source_id" LEFT OUTER JOIN "scim_identities" ON "scim_identities"."group_id" = "namespaces"."id" LEFT OUTER JOIN "group_scim_identities" ON "group_scim_identities"."group_id" = "namespaces"."id" WHERE "members"."type" = 'GroupMember' AND "members"."source_type" = 'Namespace' AND "namespaces"."type" = 'Group' AND "members"."user_id" = 1 AND "members"."requested_at" IS NULL AND (access_level >= 10) AND "namespaces"."parent_id" IS NULL AND (scim_identities.id IS NOT NULL OR group_scim_identities.id IS NOT NULL) GROUP BY "namespaces"."id"

Plan: https://postgres.ai/console/gitlab/gitlab-production-main/sessions/34979/commands/108361

Summary:  Time: 5.269 ms  
  - planning: 4.763 ms  
  - execution: 0.506 ms  
    - I/O read: 0.000 ms  
    - I/O write: 0.000 ms  
  
Shared buffers:  
  - hits: 132 (~1.00 MiB) from the buffer pool  
  - reads: 0 from the OS file cache, including disk I/O  
  - dirtied: 0  
  - writes: 0  
  1. GroupScimIdentity.for_group
SELECT "scim_identities".* FROM "scim_identities" WHERE "scim_identities"."group_id" = 100081788;
Time: 1.040 ms
  - planning: 0.962 ms
  - execution: 0.078 ms
    - I/O read: 0.000 ms
    - I/O write: 0.000 ms

Shared buffers:
  - hits: 6 (~48.00 KiB) from the buffer pool
  - reads: 0 from the OS file cache, including disk I/O
  - dirtied: 0
  - writes: 0
  1. GroupScimIdentity.with_extern_uid
SELECT "group_scim_identities".* FROM "group_scim_identities" WHERE (LOWER("group_scim_identities"."extern_uid") = LOWER('abcs'))

plan: https://console.postgres.ai/gitlab/gitlab-production-main/sessions/34979/commands/108363

Time: 0.611 ms
  - planning: 0.585 ms
  - execution: 0.026 ms
    - I/O read: 0.000 ms
    - I/O write: 0.000 ms

Shared buffers:
  - hits: 0 from the buffer pool
  - reads: 0 from the OS file cache, including disk I/O
  - dirtied: 0
  - writes: 0
  1. user.with_group_scim_identities_by_extern_uid
SELECT "users"."id", "users"."email", "users"."encrypted_password", "users"."reset_password_token", "users"."reset_password_sent_at", "users"."remember_created_at", "users"."sign_in_count", "users"."current_sign_in_at", "users"."last_sign_in_at", "users"."current_sign_in_ip", "users"."last_sign_in_ip", "users"."created_at", "users"."updated_at", "users"."name", "users"."admin", "users"."projects_limit", "users"."failed_attempts", "users"."locked_at", "users"."username", "users"."can_create_group", "users"."can_create_team", "users"."state", "users"."color_scheme_id", "users"."password_expires_at", "users"."created_by_id", "users"."last_credential_check_at", "users"."avatar", "users"."confirmation_token", "users"."confirmed_at", "users"."confirmation_sent_at", "users"."unconfirmed_email", "users"."hide_no_ssh_key", "users"."admin_email_unsubscribed_at", "users"."notification_email", "users"."hide_no_password", "users"."password_automatically_set", "users"."encrypted_otp_secret", "users"."encrypted_otp_secret_iv", "users"."encrypted_otp_secret_salt", "users"."otp_required_for_login", "users"."otp_backup_codes", "users"."public_email", "users"."dashboard", "users"."project_view", "users"."consumed_timestep", "users"."layout", "users"."hide_project_limit", "users"."note", "users"."unlock_token", "users"."otp_grace_period_started_at", "users"."external", "users"."incoming_email_token", "users"."auditor", "users"."require_two_factor_authentication_from_group", "users"."two_factor_grace_period", "users"."last_activity_on", "users"."notified_of_own_activity", "users"."preferred_language", "users"."theme_id", "users"."accepted_term_id", "users"."feed_token", "users"."private_profile", "users"."roadmap_layout", "users"."include_private_contributions", "users"."commit_email", "users"."group_view", "users"."managing_group_id", "users"."first_name", "users"."last_name", "users"."static_object_token", "users"."role", "users"."user_type", "users"."static_object_token_encrypted", "users"."otp_secret_expires_at", "users"."onboarding_in_progress", "users"."color_mode_id", "users"."composite_identity_enforced" FROM "users" INNER JOIN "group_scim_identities" ON "group_scim_identities"."user_id" = "users"."id" WHERE (LOWER("group_scim_identities"."extern_uid") = LOWER('abcd'))

Plan: https://postgres.ai/console/gitlab/gitlab-production-main/sessions/34979/commands/108368 Summary:

Time: 5.133 ms  
  - planning: 5.053 ms  
  - execution: 0.080 ms  
    - I/O read: 0.000 ms  
    - I/O write: 0.000 ms  
  
Shared buffers:  
  - hits: 0 from the buffer pool  
  - reads: 0 from the OS file cache, including disk I/O  
  - dirtied: 0  
  - writes: 0 

Fixes #495503 (closed), #498906 (closed) and #463930 (closed)

Edited by Aboobacker MK

Merge request reports

Loading