Incorrect results in case HAVING clause of sub query uses GROUP BY column of outer query

Final Release Note

Description

This is an issue that was noticed by @zylog1O1. Below is a simple example demonstrating the issue. The 2 queries should produce the exact same output since the only difference is a GROUP BY customer_id which should not matter since customer_id is a primary key column in the customers table (tests/fixtures/customers.sql). But the second query produces 2 extra rows of output in Octo which is incorrect (the rows with column values of 1 and 2).

OCTO> SELECT ALL c.customer_id FROM customers c WHERE EXISTS (SELECT 1 FROM orders GROUP BY order_id having c.customer_id > 2);
CUSTOMER_ID
3
4
5
(3 rows)
OCTO> SELECT ALL c.customer_id FROM customers c WHERE EXISTS (SELECT 1 FROM orders GROUP BY order_id having c.customer_id > 2) GROUP BY customer_id;
CUSTOMER_ID
1
2
3
4
5
(5 rows)

The expected output is 3 rows in both cases as can be seen in the Postgres output below.

customers=> SELECT ALL c.customer_id FROM customers c WHERE EXISTS (SELECT 1 FROM orders GROUP BY order_id having c.customer_id > 2);
customer_id
3
4
5
(3 rows)
customers=> SELECT ALL c.customer_id FROM customers c WHERE EXISTS (SELECT 1 FROM orders GROUP BY order_id having c.customer_id > 2) GROUP BY customer_id;
customer_id
3
5
4
(3 rows)

It is not yet clear exactly what conditions other than a GROUP BY in the outer query and sub-query are necessary for this. The draft release note will need to be updated once this is clearer.

Draft Release Note

Octo outputs correct result when outer query GROUP BY columns are referenced in inner query HAVING clause

Edited Oct 27, 2021 by Ganesh Mahesh