Backfills service account id on duo_workflows_workflows table

What does this MR do and why?

This MR is part 2 (of 4) of the solution that addresses #584271 (closed).

This MR implements a batched background migration to backfill the service_account_id column in the duo_workflows_workflows table.

service_account_id column was recently added to the duo_workflows_workflows table in this MR: !220828 (merged).

For more context, see comment.

Backfill logic:

For each workflow where service_account_id is null, the backfill resolves the correct value by:

  1. Finding the AI catalog item:
    • first via ai_catalog_item_versions if the workflow has a version_id set,
    • and fall back to matching foundational_flow_reference against workflow_definition for foundational flows
  2. Determining the top level group of the workflow:
    • from the project's namespace if project_id exists (project-level_
    • Otherwise from namespace_id for group level workflows project-level workflows,
    • Then resolve the root group using traversal_ids[1]
  3. Selecting a matching service_account_id from ai_catalog_item_consumers where:
    • service_account_id is not NULL
    • ai_catalog_item_id matches the resolved item
    • group_id matches the root group
  4. Finally, it updates the workflow with the first matching service_account_id.

Note: Some of the workflows can't be backfilled (i.e. existing custom flow rows without the following links -- see here for more context).

Changelog: changed

  1. Part 1 MR: Adds Service Account ID column to duo_workflows... (!220828 - merged)
  2. Part 2 MR: <--- this MR
  3. Part 3 MR: Updates codepaths so new rows in duo_workflows_... (!222090 - merged)
  4. Part 4 MR (feature implementation draft): Update flows agent tracking to consumer service... (!220837 - merged)

Local setup and verification using Rails console

Step 1: Set up test data

# Find or create a root group
root_group = Group.find_by(parent_id: nil) || FactoryBot.create(:group)

# Find or create a project under that group
project = root_group.projects.first || FactoryBot.create(:project, group: root_group)

# Use existing catalog item & version that has a consumer with a service account
consumer = Ai::Catalog::ItemConsumer.find_by!(
  group_id: root_group.id,
  ai_catalog_item_id: 1
)
item_version = Ai::Catalog::ItemVersion.find_by!(ai_catalog_item_id: consumer.ai_catalog_item_id)
expected_sa_id = consumer.service_account_id

# Create workflows without service_account_id
# Project-level workflow
wf_project = Ai::DuoWorkflows::Workflow.create!(
  user: User.first,
  project: project,
  ai_catalog_item_version_id: item_version.id,
  goal: "project-level backfill test"
)

# Namespace-level workflow
wf_namespace = Ai::DuoWorkflows::Workflow.create!(
  user: User.first,
  namespace: root_group,
  ai_catalog_item_version_id: item_version.id,
  goal: "namespace-level backfill test"
)

puts "wf_project: #{wf_project.id} (sa=#{wf_project.service_account_id.inspect})"
puts "wf_namespace: #{wf_namespace.id} (sa=#{wf_namespace.service_account_id.inspect})"

Step 2: Run the migration

# Run the migration
migration = Gitlab::BackgroundMigration::BackfillServiceAccountIdOnDuoWorkflowsWorkflows.new(
  start_id: Ai::DuoWorkflows::Workflow.minimum(:id),
  end_id: Ai::DuoWorkflows::Workflow.maximum(:id),
  batch_table: :duo_workflows_workflows,
  batch_column: :id,
  sub_batch_size: 100,
  pause_ms: 0,
  connection: ApplicationRecord.connection
)

migration.perform

Step 3: Verify results

wf_project.reload
wf_namespace.reload

puts "  Project workflow:   #{wf_project.service_account_id == expected_sa_id ? '✅' : '❌'} got=#{wf_project.service_account_id} expected=#{expected_sa_id}"
puts "  Namespace workflow: #{wf_namespace.service_account_id == expected_sa_id ? '✅' : '❌'} got=#{wf_namespace.service_account_id} expected=#{expected_sa_id}"

# Verify no workflows with a version but without service_account were missed (0 count = all workflows are correctly backfilled)
missed_wf = Ai::DuoWorkflows::Workflow
  .where.not(ai_catalog_item_version_id: nil)
  .where(service_account_id: nil)

puts "Workflows still missing service_account_id: #{missed_wf.count}"

References

Screenshots or screen recordings

Before After

How to set up and validate locally

MR acceptance checklist

Evaluate this MR against the MR acceptance checklist.

Edited by Shola Quadri

Merge request reports

Loading