Skip to content

Draft: Claim records

What does this MR do and why?

This tries to implement gitlab-com/gl-infra/tenant-scale/cells-infrastructure/team#365

Part of: https://handbook.gitlab.com/handbook/engineering/architecture/design-documents/cells/topology_service_transactional_behavior/

References

How to set up and validate locally

  • Run Topology service with ./env.sh spanner bin/topology-service serve from trans-behavior-final-form branch: gitlab-org/cells/topology-service!308
  • Update testing Spanner config so that it matches with GDK:
    diff --git a/db/migrations/claims/spanner/20250929142300_create_claims_table.sql b/db/migrations/claims/spanner/20250929142300_create_claims_table.sql
    index abe2d58..830e256 100644
    --- a/db/migrations/claims/spanner/20250929142300_create_claims_table.sql
    +++ b/db/migrations/claims/spanner/20250929142300_create_claims_table.sql
    @@ -10,7 +10,7 @@ CREATE TABLE IF NOT EXISTS claims (
       source_type INT64 NOT NULL,
       source_record_id INT64 NOT NULL,
       user_data STRING(MAX),
    -  status INT64 NOT NULL DEFAULT 0,
    +  status INT64 NOT NULL,
       lease_uuid STRING(36),
       creator_id STRING(128) NOT NULL,
       created_at TIMESTAMP NOT NULL DEFAULT (CURRENT_TIMESTAMP())
    diff --git a/env/test/default-cells.toml b/env/test/default-cells.toml
    index 02df048..b249a02 100644
    --- a/env/test/default-cells.toml
    +++ b/env/test/default-cells.toml
    @@ -1,6 +1,6 @@
     [[cells]]
     id = 1
    -address = "my.cell-1.example.com"
    +address = "127.0.0.1:3333"
     session_prefix = "cell-1"
     [[cells.sequence_ranges]]
     minval = 100000000000
    diff --git a/env/test/default-serve.toml b/env/test/default-serve.toml
    index 2ce60a1..29abc65 100644
    --- a/env/test/default-serve.toml
    +++ b/env/test/default-serve.toml
    @@ -1,7 +1,7 @@
     [[serve]]
    -address = ":9097"
    +address = ":9095"
     features = ["*_grpc"]
     
     [[serve]]
    -address = ":9098"
    +address = ":9096"
     features = ["*_rest"]
    diff --git a/internal/mtls/interceptor.go b/internal/mtls/interceptor.go
    index fcd7449..384fa2a 100644
    --- a/internal/mtls/interceptor.go
    +++ b/internal/mtls/interceptor.go
    @@ -33,6 +33,8 @@ func (h *AuthHandler) Interceptor(ctx context.Context, req any, info *grpc.Unary
     		return handler(ctx, req)
     	}
     
    +	return handler(ctx, req)
    +
     	role, err := decodeRoleFromHeaders(h.Env, ctx)
     	if err != nil {
     		log.WithError(err).WithFields(log.Fields{
  • Run GDK without Topology service (GDK doesn't support the new way to run Topology service yet)
  • Update your config/puma.rb so that it doesn't fork: workers 0. Otherwise I have this error: grpc cannot be used before and after forking unless the GRPC_ENABLE_FORK_SUPPORT env var is set to "1" and the platform supports it (linux only)
  • Run migration as usual
  • Run everything else as usual
  • Checking data and log with:
    • gdk tail rails-web
    • tail -f gitlab/log/application.log
    • (Or you can also test with PostgreSQL and use psql instead) gcloud --project topology-service spanner databases execute-sql topology-service --instance=topology-service --sql="select * from claims"

Tasks and checklist for this merge request

  • Creating a user from console worked
    • u = User.new; u.assign_personal_namespace Organizations::Organization.default_organization; u.update(username: 'test', email: 'test@example.com', password: 'this is a super long password', name: 'name')
  • Updating a user from console worked
    • User.find_by(username: 'test').update(username: 'test-update')
  • Deleting a user from console worked
    • User.find_by(username: 'test-update').destroy
  • Raise random errors during the process and check if it's calling RollbackUpdate properly
    • Uncomment the random errors in the code to simulate random errors, and repeat the tests
  • Test the following scenarios manually (and check logs and data we saved)
    • Create a user
    • Create a project under the user
    • Create a group
    • Create a project under the group
    • Add an ssh key
    • Add a gpg key
    • Delete the user with all those resources
  • Move attribute types from the map in claim service to where we're defining the attributes. That is: cells_unique_attribute :email, type: Gitlab::Cells::TopologyService::ClaimBucket::BucketType::UNSPECIFIED (we don't have email type yet!)
  • Figure out how to make sure Rails will call before_commit for all the records that we're deleting. Some are not called like gpg_key_subkeys, and possibly also emails. Not sure why, but I suspect that it's possibly deleted by PostgreSQL with DELETE CASCADE therefore it's bypassing Rails. That does make sense, but we need to make sure we're releasing the claims on Topology service.
    • We can consider trying to look into those associations from the parent records, because if we know we're deleting the parent records and we know the children will get deleted in cascade, there's no need to really call before_commit for those records and we can just queue all data we want to delete anyway. Or we can still try to trigger those, depending on which is easier to do.
    • Check if the preloader to load deleting data won't load again for the loaded data
    • Use dependent: :destroy instead
    • [-] A separate trigger + worker approach: !204871
      • We'll handle this later, not in this merge request
  • [-] Add timeout for transaction, because we're claiming inside the transaction, which can be slow.
    • Decided to not do because we don't want to timeout currently slow transactions, and transaction timeout only works in newer PostgreSQL
  • Add timeout for gRPC requests within the transaction !206893 (merged)
  • Figure out how to set the owner properly !207135 (merged)
  • Wrap this around a feature flag !206729 (merged)
  • Make sure we resolve the concerns and comments in !207629 (merged)
  • Write more tests!
  • Refine the names, filenames, module names, etc

MR acceptance checklist

Evaluate this MR against the MR acceptance checklist. It helps you analyze changes to reduce risks in quality, performance, reliability, security, and maintainability.

Edited by Lin Jen-Shin

Merge request reports

Loading