Skip to content

Add script to identify SQL queries unique to MR

What does this MR do and why?

This MR introduces a new feature to identify SQL queries that are added in merge requests, building on our recently improved fingerprinting system. This implements the final piece of Automated Query Analysis for Merge Requests (&8440) by adding an automated SQL query differ for merge requests.

When developers add new database queries in their code changes, these can potentially impact database performance. Currently, there's no automated way to identify new query patterns introduced in an MR, making it difficult to catch performance issues early in the development cycle.

With this MR we are adding a new merge_request_query_differ script that:

  • Extracts SQL fingerprints from an MR's auto-explain logs
  • Downloads the consolidated master fingerprints package Improve SQL fingerprint extraction to track uni... (!188275 - merged)
  • Compares the MR fingerprints against master to identify truly new query patterns
  • Generates a detailed report of new SQL queries with their execution plans: Stored in artifact on rspec:merge-auto-explain-logs job

The CI configuration has been updated to:

  • For master branch: continue extracting and consolidating fingerprints as before
  • For MRs: automatically run the query differ and generate a report of new queries

This should help us by improving

  • Early detection of potentially problematic SQL queries
  • Increased visibility into database impact of code changes
  • More efficient code reviews by highlighting specific database changes
  • Improved database performance by catching inefficient queries before they reach production

References

How to set up and validate locally

  • For CI, check logs and artifacts of rspec:merge-auto-explain-logs jobs

  • To run the script locally,

# download artifacts from either the `rspec:merge-auto-explain-logs` job, or from individual `rspec:` jobs and place in a directory named `auto_explain` 

gitlab git:(jmd/optomize-extract-upload-fingerprints) ls -al auto_explain 
total 17008
drwxr-xr-x    4 john  staff      128 Apr 18 11:41 .
drwxr-xr-x@ 130 john  staff     4160 Apr 17 03:38 ..
-rw-r--r--@   1 john  staff  8254650 Apr 16 20:50 auto_explain.ndjson.gz
-rw-r--r--@   1 john  staff   446715 Apr 16 20:16 rspec-unit-pg16-33-44.346.main.ndjson.gz


➜  gitlab git:(jmd/optomize-extract-upload-fingerprints) ./scripts/merge_request_query_differ.rb ./auto_explain/auto_explain.ndjson.gz 
I, [2025-04-18T11:42:21.950982 #32998]  INFO -- : MR Query Diff: Analyzing new queries in MR compared to master
I, [2025-04-18T11:42:21.951018 #32998]  INFO -- : Extracting queries from MR file: ./auto_explain/auto_explain.ndjson.gz
I, [2025-04-18T11:42:22.393775 #32998]  INFO -- : Extracted 25092 queries from MR file
I, [2025-04-18T11:42:22.399682 #32998]  INFO -- : Found 25092 total queries in MR
I, [2025-04-18T11:42:22.399709 #32998]  INFO -- : Fetching master fingerprints from consolidated package...
I, [2025-04-18T11:42:22.402036 #32998]  INFO -- : Downloading from: https://gitlab.com/api/v4/projects/278964/packages/generic/auto-explain-logs/master/query-fingerprints.tar.gz
I, [2025-04-18T11:42:23.944731 #32998]  INFO -- : Loaded 34227 master fingerprints from consolidated package
I, [2025-04-18T11:42:23.944840 #32998]  INFO -- : Filtering 25092 queries against master fingerprints...
I, [2025-04-18T11:42:24.038906 #32998]  INFO -- : Filtered out 25073 existing queries, 19 new queries found
I, [2025-04-18T11:42:24.038953 #32998]  INFO -- : 
Final result: 19 new queries compared to all master packages
I, [2025-04-18T11:42:24.040760 #32998]  INFO -- : Report saved to ./auto_explain/new_sql_queries.md

A sample artifact manually added for review in: !188201 (comment 2458275985)

MR acceptance checklist

Evaluate this MR against the MR acceptance checklist. It helps you analyze changes to reduce risks in quality, performance, reliability, security, and maintainability.

Edited by John McDonnell

Merge request reports

Loading