SQLite: Correlated "EXISTS" vs uncorrelated "IN"
-
Just after opinions here: which of these would be the best practice for an SQLite database:
DELETE FROM Messages WHERE Account = @acct AND NOT EXISTS (SELECT * FROM temp.IdList WHERE MailId=Id)
or
DELETE FROM Messages WHERE Account = @acct AND Id NOT IN (SELECT MailId FROM temp.IdList)Indexes exist for both Messages(Account, Id) and temp.IdList(MailId) Each table may hold in excess of 20000 rows From what I understand, the correlated exists will need to be evaluated for each row of the Messages table, but will stop evaluating as soon as it finds a match (and being indexed, should be pretty quick). However, with the IN expression, the full result set is evaluated only once, but the comparison will be against that full set. Cheers, Mick
-
Just after opinions here: which of these would be the best practice for an SQLite database:
DELETE FROM Messages WHERE Account = @acct AND NOT EXISTS (SELECT * FROM temp.IdList WHERE MailId=Id)
or
DELETE FROM Messages WHERE Account = @acct AND Id NOT IN (SELECT MailId FROM temp.IdList)Indexes exist for both Messages(Account, Id) and temp.IdList(MailId) Each table may hold in excess of 20000 rows From what I understand, the correlated exists will need to be evaluated for each row of the Messages table, but will stop evaluating as soon as it finds a match (and being indexed, should be pretty quick). However, with the IN expression, the full result set is evaluated only once, but the comparison will be against that full set. Cheers, Mick
Just going by your analysis, I would say the NOT EXISTS would be better. As you say if the table is indexed it should be pretty quick.
The difficult we do right away... ...the impossible takes slightly longer.
-
Just after opinions here: which of these would be the best practice for an SQLite database:
DELETE FROM Messages WHERE Account = @acct AND NOT EXISTS (SELECT * FROM temp.IdList WHERE MailId=Id)
or
DELETE FROM Messages WHERE Account = @acct AND Id NOT IN (SELECT MailId FROM temp.IdList)Indexes exist for both Messages(Account, Id) and temp.IdList(MailId) Each table may hold in excess of 20000 rows From what I understand, the correlated exists will need to be evaluated for each row of the Messages table, but will stop evaluating as soon as it finds a match (and being indexed, should be pretty quick). However, with the IN expression, the full result set is evaluated only once, but the comparison will be against that full set. Cheers, Mick
I cannot speak for SQLite however I can for SQL Server(which perhaps does not help very much here). It is best to never use a
NOT IN
statement, where the results returned could contain many rows, as it will need to evaluate for every single row in the inner select. Whereas with aNOT EXISTS
or aLEFT JOIN
with aIS NOT NULL
condition, which is another option, it will evaluate as false as soon as it hits the first row where there is a match. So basically I agree with your evaluation - for what it's worth I always useNOT EXISTS
nowadays.“That which can be asserted without evidence, can be dismissed without evidence.”
― Christopher Hitchens
-
Just after opinions here: which of these would be the best practice for an SQLite database:
DELETE FROM Messages WHERE Account = @acct AND NOT EXISTS (SELECT * FROM temp.IdList WHERE MailId=Id)
or
DELETE FROM Messages WHERE Account = @acct AND Id NOT IN (SELECT MailId FROM temp.IdList)Indexes exist for both Messages(Account, Id) and temp.IdList(MailId) Each table may hold in excess of 20000 rows From what I understand, the correlated exists will need to be evaluated for each row of the Messages table, but will stop evaluating as soon as it finds a match (and being indexed, should be pretty quick). However, with the IN expression, the full result set is evaluated only once, but the comparison will be against that full set. Cheers, Mick
I agree with the two previous comments - NOT EXISTS will give better performance. I found this article some time ago - again it's for SQL rather than SQLite but the principles still apply ... NOT IN vs. NOT EXISTS vs. OUTER APPLY vs. OUTER JOIN[^]
-
Just after opinions here: which of these would be the best practice for an SQLite database:
DELETE FROM Messages WHERE Account = @acct AND NOT EXISTS (SELECT * FROM temp.IdList WHERE MailId=Id)
or
DELETE FROM Messages WHERE Account = @acct AND Id NOT IN (SELECT MailId FROM temp.IdList)Indexes exist for both Messages(Account, Id) and temp.IdList(MailId) Each table may hold in excess of 20000 rows From what I understand, the correlated exists will need to be evaluated for each row of the Messages table, but will stop evaluating as soon as it finds a match (and being indexed, should be pretty quick). However, with the IN expression, the full result set is evaluated only once, but the comparison will be against that full set. Cheers, Mick
NOT IN
andNOT EXISTS
are not directly comparable since they treat nulls differently.EXISTS
returnsTRUE
orFALSE
only,IN
is trivalent and returnsTRUE
,FALSE
orNULL
. Use the one that gives you the result you expect. (Most probablyNOT EXISTS
)Wrong is evil and must be defeated. - Jeff Ello
-
Just after opinions here: which of these would be the best practice for an SQLite database:
DELETE FROM Messages WHERE Account = @acct AND NOT EXISTS (SELECT * FROM temp.IdList WHERE MailId=Id)
or
DELETE FROM Messages WHERE Account = @acct AND Id NOT IN (SELECT MailId FROM temp.IdList)Indexes exist for both Messages(Account, Id) and temp.IdList(MailId) Each table may hold in excess of 20000 rows From what I understand, the correlated exists will need to be evaluated for each row of the Messages table, but will stop evaluating as soon as it finds a match (and being indexed, should be pretty quick). However, with the IN expression, the full result set is evaluated only once, but the comparison will be against that full set. Cheers, Mick
Ok guys - I guess the verdict's in. It's what I thought initially thought would be the case too. Its just that the doco on Correlated vs Uncorrelated sub-queries threw me a little, where correlated sub-queries are re-evaluated for each row of the main query. I do agree, however, that re-evaluating the sub-query will be significantly better than searching the full result set. Cheers people.