Add offset optimization to Audit Events API
What does this MR do and why?
MR adds the offset optimization to the audit events finder, which is used in controllers/APIs
MR also adds the feature flag: audit_events_api_offset_optimization The optimization is set behind a feature flag AND if the page requested is >100
Sort needs to be deterministic for the optimization to uniquely identify the row (created_at, id in this case)
References
Please include cross links to any resources that are relevant to this MR. This will give reviewers and future readers helpful context to give an efficient review of the changes introduced.
- Internal issue: #460264 (closed)
- Follow up MR to use keyset pagination in a breaking change: #382338
MR acceptance checklist
Please evaluate this MR against the MR acceptance checklist. It helps you analyze changes to reduce risks in quality, performance, reliability, security, and maintainability.
How to set up and validate locally
- Tested against the
/api/v4/audit_eventsongdk - Check out the branch and run a script to generate a large amount of audit events, the original issue had
per_page: 100andpage: 2245 - Open rails console and enable feature flag:
Feature.enable(:audit_events_api_offset_optimization) - Instantiate the Finder object:
-
finder = AuditEventFinder.new(level: Gitlab::Audit::Levels::Instance.new, params: { page: 1000, per_page: 100 })or finder = AuditEventFinder.new(level: Gitlab::Audit::Levels::Instance.new, params: { page: 1000, per_page: 100, created_after: "2024-01-01" })
-
- Run the finder:
finder.execute.load - Disable the feature flag:
Feature.disable(:audit_events_api_offset_optimization) - And re-run the finder
Note the query that it outputs:
SELECT "audit_events".* FROM "audit_events" LIMIT 100 OFFSET 99900 & SELECT "audit_events".* FROM "audit_events" WHERE "audit_events"."created_at" >= '2024-01-01 00:00:00' LIMIT 100 OFFSET 99900
and the time it takes to load with/without the feature flag
When the feature flag is off, you'll notice we have to load all the records into memory (example from my terminal):
Query fetched 1521080 AuditEvent records: SELECT "audit_events".* FROM "audit_events" WHERE "audit_events"."created_at" >= '2024-01-01 00:00:00'
Local consoles script to generate a large amount of audit events:
count = 1_500_000 # number of events to create
days_back = 365 # spread over last 365 days
user = User.admins.first || User.last
puts "Generating #{count} instance-level audit events..."
count.times do |i|
random_date = rand(days_back).days.ago
event_type = %w[
user_created
user_deleted
group_created
project_created
project_destroyed
].sample
AuditEvent.create!(
author_id: user.id,
entity_type: 'Gitlab::Audit::InstanceScope',
entity_id: 1,
author_name: user.name,
details: {
author_name: user.name,
target_type: event_type,
target_id: rand(1..1000),
target_details: "Fake instance event: #{event_type}"
},
created_at: random_date
)
print '.' if i % 100 == 0 # progress indicator
end
puts "\nDone!"