Database: Consider removing the surrounding SELECT when using FromUnion.from_union
In !49518 (merged) we noticed a big difference between these queries (example is illustative and does not fully resemble the use-case of the mentioned MR):
Using FromUnion.from_union
:
users = User.from_union([User.where(id: 1), User.where(id: 2)])
Resulting in this SQL:
SELECT *
FROM (
SELECT *
FROM users
WHERE id = 1
UNION
SELECT *
FROM users
WHERE id = 2
) users;
Compared to:
users = Users.find_by_sql(::Gitlab::SQL::Union.new([User.where(id: 1), User.where(id: 2)]).to_sql)
Running this SQL:
(
SELECT *
FROM users
WHERE id = 1
UNION
SELECT *
FROM users
WHERE id = 2
)
This made a huge difference, see !49518 (diffs, comment 465148060).
Proposal
We might want to look into other use-cases of FromUnion.from_union
and compare them to plain use of Gitlab::SQL::Union
. If the result for more cases are better, we have to modify FromUnion.from_union
so it builds a more optimal SQL query.
Challenge
When you use FromUnion.from_union
you get an AR relation object, so you can add additional scopes to that. .find_by_sql
instead returns an array of objects. To still make it possible to pass scopes to FromUnion.from_union
we might need to forward these scopes to the subqueries of the UNION
. As we have seen in the aforementioned MR, this has an added performance benefit, for example in this case:
Before:
SELECT *
FROM (
SELECT *
FROM users
WHERE admin = true
UNION
SELECT *
FROM users
WHERE admin = true
) users
WHERE
After:
(
SELECT *
FROM users
WHERE admin = true
AND id >= 100 AND id < 2100
UNION
SELECT *
FROM users
WHERE admin = true
AND id >= 100 AND id < 2100
)