Define the safe/correct way to write N+1 query specs

The following discussion from !106241 (merged) should be addressed:

@mcelicalderonG started a discussion: (+6 comments)

Very interesting finding @mcelicalderonG , thanks for looping me in. I can definitely see how counting cached queries and comparing the count of queries for request 1 vs request 2 could mask N+1 issues due to certain queries that only run on a first request. @stanhu since you are the one who added the exceed_all_query_limit matcher, I am wondering if you have any thoughts on this issue. I previous opened a docs MR that suggested we count cached queries in N+1 specs but now it seems like we should go the opposite direction 🤔 @NikolayS you wrote these very handy/persuasive docs on use_sql_query_cache and why we should ignore cached queries in specs so I also wanted to ping you here in case you have thoughts. Edited by Jessie Young 20 hours ago

@NikolayS

cc @nmilojevic1

@mcelicalderonG

While working on !106251 (merged) (diffs) I discovered another instance where this is happening. We actually had a false positive there since control has way more queries than the second request. I think that something changed recently. I don't think this was an issue before? But I'm not sure

@stanhu

@jessieay I think it depends on what these cached queries are. If it's a case where we are querying the same Project (for example) over and over, cached queries are usually not desirable. Isn't that what aa1d1848 is saying? But if we're doing warm-up to get rid of some setup queries, such as loading the EE license, I think it makes sense to ignore those cached queries.

@mcelicalderonG

@stanhu I think the problem here is not around cached queries. We can work with those by doing what is described in https://docs.gitlab.com/ee/development/database/query_recorder.html#cached-queries The problem is with queries that will be executed only once in a given time. Like when a user sends the first request in an hour. That's going to track some data. What I'm thinking is that maybe the safest way to avoid these false positives is always doing a warm up request? What we do on the first request can change over time and break (specs won't fail but will stop catching n+1) all these specs that don't do warm up request

@eduardobonet

Jumping in from !106256 (merged) (comment 1200432133) Note that on that MR I do make a warm up request, and still the new feature adds requests that were not account for in the first place. If I increase the number of experiments to 10, the count will still be off by 6 (1 over the threshold), which shows it's not an n+1. What would be the best way to fix that test without ignoring the cached queries?

Edited Dec 08, 2022 by Mario Celi

Admin message

Define the safe/correct way to write N+1 query specs