Add script to identify SQL queries unique to MR
What does this MR do and why?
This MR introduces a new feature to identify SQL queries that are added in merge requests, building on our recently improved fingerprinting system. This implements the final piece of Automated Query Analysis for Merge Requests (&8440) by adding an automated SQL query differ for merge requests.
When developers add new database queries in their code changes, these can potentially impact database performance. Currently, there's no automated way to identify new query patterns introduced in an MR, making it difficult to catch performance issues early in the development cycle.
With this MR we are adding a new merge_request_query_differ
script that:
- Extracts SQL fingerprints from an MR's auto-explain logs
- Downloads the consolidated master fingerprints package Improve SQL fingerprint extraction to track uni... (!188275 - merged)
- Compares the MR fingerprints against master to identify truly new query patterns
- Generates a detailed report of new SQL queries with their execution plans: Stored in artifact on
rspec:merge-auto-explain-logs
job
The CI configuration has been updated to:
- For master branch: continue extracting and consolidating fingerprints as before
- For MRs: automatically run the query differ and generate a report of new queries
This should help us by improving
- Early detection of potentially problematic SQL queries
- Increased visibility into database impact of code changes
- More efficient code reviews by highlighting specific database changes
- Improved database performance by catching inefficient queries before they reach production
References
- Depends on Improve SQL fingerprint extraction to track uni... (!188275 - merged)
- Automated Query Analysis for Merge Requests (&8440)
How to set up and validate locally
-
For CI, check logs and artifacts of
rspec:merge-auto-explain-logs
jobs -
To run the script locally,
# download artifacts from either the `rspec:merge-auto-explain-logs` job, or from individual `rspec:` jobs and place in a directory named `auto_explain`
gitlab git:(jmd/optomize-extract-upload-fingerprints) ls -al auto_explain
total 17008
drwxr-xr-x 4 john staff 128 Apr 18 11:41 .
drwxr-xr-x@ 130 john staff 4160 Apr 17 03:38 ..
-rw-r--r--@ 1 john staff 8254650 Apr 16 20:50 auto_explain.ndjson.gz
-rw-r--r--@ 1 john staff 446715 Apr 16 20:16 rspec-unit-pg16-33-44.346.main.ndjson.gz
➜ gitlab git:(jmd/optomize-extract-upload-fingerprints) ./scripts/merge_request_query_differ.rb ./auto_explain/auto_explain.ndjson.gz
I, [2025-04-18T11:42:21.950982 #32998] INFO -- : MR Query Diff: Analyzing new queries in MR compared to master
I, [2025-04-18T11:42:21.951018 #32998] INFO -- : Extracting queries from MR file: ./auto_explain/auto_explain.ndjson.gz
I, [2025-04-18T11:42:22.393775 #32998] INFO -- : Extracted 25092 queries from MR file
I, [2025-04-18T11:42:22.399682 #32998] INFO -- : Found 25092 total queries in MR
I, [2025-04-18T11:42:22.399709 #32998] INFO -- : Fetching master fingerprints from consolidated package...
I, [2025-04-18T11:42:22.402036 #32998] INFO -- : Downloading from: https://gitlab.com/api/v4/projects/278964/packages/generic/auto-explain-logs/master/query-fingerprints.tar.gz
I, [2025-04-18T11:42:23.944731 #32998] INFO -- : Loaded 34227 master fingerprints from consolidated package
I, [2025-04-18T11:42:23.944840 #32998] INFO -- : Filtering 25092 queries against master fingerprints...
I, [2025-04-18T11:42:24.038906 #32998] INFO -- : Filtered out 25073 existing queries, 19 new queries found
I, [2025-04-18T11:42:24.038953 #32998] INFO -- :
Final result: 19 new queries compared to all master packages
I, [2025-04-18T11:42:24.040760 #32998] INFO -- : Report saved to ./auto_explain/new_sql_queries.md
A sample artifact manually added for review in: !188201 (comment 2458275985)
MR acceptance checklist
Evaluate this MR against the MR acceptance checklist. It helps you analyze changes to reduce risks in quality, performance, reliability, security, and maintainability.