Journal Journal: SQL: * expansion inside of EXISTS()
[Used gemini for formatting. It seems to have edited the text somewhere, and the table on bottom is atrocious. I ought to come back to this later. It's too late to continue with it now.]
When I was working with Oracle, I thought I had heard that EXISTS(SELECT *
It makes sense to use SELECT *. * says something. In COUNT(), * means count every record, even if every column is null. Indeed, the only way to directly count a record where every column is null is to use count(*). * means check existence. So, in EXISTS, where the only thing we care about is the existence, * makes perfect sense. (Of course, it would be better to just remove the entire SELECT clause and the FROM keyword, because it is redundant.)
Some people use SELECT 1, others SELECT NULL. Does it really matter? Even if it did do a dictionary lookup, how long would it take anyway?
Well, I found my answer. Recently, I was running a query using HAVING without a GROUP BY. Now, some people might object and say you can't do that, or it makes no sense. Those people do not understand GROUP BY. Whenever you use an aggregate, the entire table is the group. You only have to use GROUP BY if you want to segment the table further. Now, in most cases, using HAVING with no GROUP BY will make no sense. But, say you wanted a correlated subquery to check if a parent has multiple children, you would want a HAVING on the subtable with no need for a GROUP BY:
select * from parent where exists(select * from child where child.parent = parent.id having count(*) > 1);
Okay, but does this work? Let's use OneCompiler to test.
mysql:
with a as (select 1) select 1 from a having count(*) = 1;
works
with a as (select 1) select * from a having count(*) = 1;
ERROR 1140 (42000) at line 1: In aggregated query without GROUP BY, expression #1 of SELECT list contains nonaggregated column 'a.1'; this is incompatible with sql_mode=only_full_group_by
oracle:
with a as (select 1 from dual) select 1 from a having count(*) = 1;
works
with a as (select 1 from dual) select * from a having count(*) = 1;
ERROR at line 1: ORA-00937: not a single-group group function
postgresql:
with a as (select 1) select 1 from a having count(*) = 1;
works
with a as (select 1) select * from a having count(*) = 1;
psql:commands.sql:1: ERROR: column "a.?column?" must appear in the GROUP BY clause or be used in an aggregate function
sqlite:
with a as (select 1) select 1 from a having count(*) = 1;
Parse error near line 1: HAVING clause on a non-aggregate query
with a as (select 1) select * from a having count(*) = 1;
Parse error near line 1: HAVING clause on a non-aggregate query
mariadb:
with a as (select 1) select 1 from a having count(*) = 1;
works
with a as (select 1) select * from a having count(*) = 1;
works
sql server:
with a as (select 1 a) select 1 from a having count(*) = 1;
works
with a as (select 1 a) select * from a having count(*) = 1;
Msg 8120, Level 16, State 1, Server b2de965b1bf1, Line 2: Column 'a.a' is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause.
duckdb:
with a as (select 1) select 1 from a having count(*) = 1;
works
with a as (select 1) select * from a having count(*) = 1;
Binder Error: column "1" must appear in the GROUP BY clause or must be part of an aggregate function. Either add it to the GROUP BY list, or use "ANY_VALUE(1)" if the exact value of "1" is not important.
So, results are mixed. Most of the tested databases allow HAVING without a GROUP BY as long as you do not specify any columns, and instead use a literal. This makes sense, because the * in the select clause needs to be expanded so the db knows what to return. That in turn will check the HAVING clause. The only exceptions are sqlite that does not even allow the literal, and mariadb that allows the *.
But what about in an exists clause where the select clause is of no consequence?
mysql:
with a as (select 1) select 1 where exists(select 1 from a having count(*) = 1)
works
with a as (select 1) select 1 where exists(select * from a having count(*) = 1)
ERROR 1140 (42000) at line 1: In aggregated query without GROUP BY, expression #1 of SELECT list contains nonaggregated column 'a.1'; this is incompatible with sql_mode=only_full_group_by
oracle:
with a as(select 1 from dual) select 1 where exists(select * from a having count(*) = 1);
works
with a as(select 1 from dual) select 1 where exists(select 1 from a having count(*) = 1);
works
postgresql:
with a as (select 1) select 1 where exists(select 1 from a having count(*) = 1)
works
with a as (select 1) select 1 where exists(select * from a having count(*) = 1)
psql:commands.sql:1: ERROR: column "a.?column?" must appear in the GROUP BY clause or be used in an aggregate function
sqlite:
with a as (select 1) select 1 where exists(select 1 from a having count(*) = 1)
Parse error near line 1: HAVING clause on a non-aggregate query
with a as (select 1) select 1 where exists(select * from a having count(*) = 1)
Parse error near line 1: HAVING clause on a non-aggregate query
mariadb:
with a as (select 1) select 1 where exists(select 1 from a having count(*) = 1)
works
with a as (select 1) select 1 where exists(select * from a having count(*) = 1)
works
sql server:
with a as (select 1 a) select 1 where exists(select 1 from a having count(*) = 1)
works
with a as (select 1 a) select 1 where exists(select * from a having count(*) = 1)
works
duckdb:
with a as (select 1) select 1 where exists(select 1 from a having count(*) = 1)
works
with a as (select 1) select 1 where exists(select * from a having count(*) = 1)
Binder Error: column "1" must appear in the GROUP BY clause or must be part of an aggregate function.
So, the same as before, except, oracle and sql server allow it (as well as mariadb). This is understandable, because there is no reason to expand the * in an exists.
So, maybe those databases actually run the query? Let's check. We'll put a function in the select clause that causes an error if run.
mysql:
CREATE FUNCTION f()RETURNS INT RETURN(SELECT 1,2); select f()
ERROR 1064 (42000) at line 2: You have an error in your SQL syntax...
CREATE FUNCTION f()RETURNS INT RETURN(SELECT 1,2); SELECT 1 WHERE exists(select f());
works
oracle:
CREATE FUNCTION f RETURN INT IS BEGIN RETURN 1/0;END; / SELECT f;
ORA-01476: divisor is equal to zero
CREATE FUNCTION f RETURN INT IS BEGIN RETURN 1/0;END; / SELECT 1 where exists(select f);
works
postgresql:
CREATE FUNCTION f()RETURNS INT LANGUAGE SQL RETURN 1/0; SELECT f();
ERROR: division by zero
CREATE FUNCTION f()RETURNS INT LANGUAGE SQL RETURN 1/0; SELECT 1 WHERE EXISTS(SELECT f());
works
sqlite:
SELECT json('x');
Runtime error near line 1: malformed JSON
SELECT 1 WHERE EXISTS(SELECT json('x'));
works
mariadb:
CREATE FUNCTION f()RETURNS INT RETURN(SELECT 1,2); SELECT f();
ERROR 4078 (HY000) at line 2: Cannot cast 'row' as 'int' in assignment of `f()`
CREATE FUNCTION f()RETURNS INT RETURN(SELECT 1,2); SELECT 1 WHERE EXISTS(SELECT f());
works
sql server:
select 1/0;
Divide by zero error encountered.
select 1 where exists(select 1/0);
works
duckdb:
CREATE MACRO f()AS CAST('x'AS INT); SELECT f();
Conversion Error: Could not convert string 'x' to INT32
CREATE MACRO f()AS CAST('x'AS INT); SELECT 1 WHERE EXISTS(SELECT f());
works
So, every single one of these did not evaluate the exists projection list. There's more ways to test this. I learned a few while going along, until I realized I could have just used 1/0 for all of them! That's what I get for trying too hard.
Here is the final behavior matrix comparing these database systems (where "y" indicates successful execution and "n" indicates a raised error or parse failure):
+------------+----------+----------+----------+----------+--------------+--------------+
| Database | 1/having | */having | exists 1 | exists * | select error | exists error |
+------------+----------+----------+----------+----------+--------------+--------------+
| mysql | y | n | y | n | n | y |
| oracle | y | n | y | y | n | y |
| postgresql | y | n | y | n | n | y |
| sqlite | n | n | n | n | n | y |
| mariadb | y | y | y | y | n | y |
| sql server | y | n | y | y | n | y |
| duckdb | y | n | y | n | n | y |
+------------+----------+----------+----------+----------+--------------+--------------+