INTERSECT vs SELF-JOINS
-
For the sake of this question, I've massively reduced everything. I know there are alternate solutions, but I want to understand SQL better and solve this specific problem. (So please refrain from suggesting alternate schemas--I already have those; I really want to understand INTERSECT vs SELF INNER JOIN.) Given two tables with two columns each:
Table: Words
Id Name
1 One
2 Two
3 ThreeTable: Letters
Id Letter
1 A
1 B
1 C
2 A
2 D
3 A
3 CThe task is to find all the Ids in Words which match two letters from Letters. For example, I want to find all the rows in Words which match the letters A AND C from letters. Again, massive oversimplification. I can do the following (in SQLite):
SELECT DISTINCT Words.Name FROM Words JOIN Letters ON Words.Id=Letters.Id WHERE Letters.Letter='A'
INTERSECT
SELECT DISTINCT Words.Name FROM Words JOIN Letters ON Words.Id=Letters.Id WHERE Letters.Letter='C'
ORDER BY name COLLATE NOCASEThis returns One and Three. I'm happy. Except, it bugs me. I should be able to do the same thing with a series of INNER JOINS, but I've yet to come up with a solution which actually runs. Even with an intermediate table, I'm still not sure how to solve it and thus want to first solve it with this schema. Is there a SQL wizard in the house? EDIT: Just after posting this, I came up with the following. It works, but is it optimal? Is there a better way? (It seems that if Words is really big, we end up with two huge joins which are then reduced or with the optimizer figure this out?)
SELECT DISTINCT Words.Name FROM Words
INNER JOIN Letters AS L1 ON Words.Id=L1.Id
INNER JOIN Letters AS L2 ON Words.Id=L2.Id
WHERE L1.Letter='A' AND L2.Letter='C'
ORDER BY Words.Name COLLATE NOCASEI aslso got the following suggestion:
SELECT DISTINCT Words.Name FROM Words
INNER JOIN Letters AS L1 ON Words.Id=L1.Id AND L1.Letter='A'
INNER JOIN Letters AS L2 ON Words.Id=L2.Id AND L2.Letter='C'
ORDER BY Words.Name COLLATE NOCASE -
For the sake of this question, I've massively reduced everything. I know there are alternate solutions, but I want to understand SQL better and solve this specific problem. (So please refrain from suggesting alternate schemas--I already have those; I really want to understand INTERSECT vs SELF INNER JOIN.) Given two tables with two columns each:
Table: Words
Id Name
1 One
2 Two
3 ThreeTable: Letters
Id Letter
1 A
1 B
1 C
2 A
2 D
3 A
3 CThe task is to find all the Ids in Words which match two letters from Letters. For example, I want to find all the rows in Words which match the letters A AND C from letters. Again, massive oversimplification. I can do the following (in SQLite):
SELECT DISTINCT Words.Name FROM Words JOIN Letters ON Words.Id=Letters.Id WHERE Letters.Letter='A'
INTERSECT
SELECT DISTINCT Words.Name FROM Words JOIN Letters ON Words.Id=Letters.Id WHERE Letters.Letter='C'
ORDER BY name COLLATE NOCASEThis returns One and Three. I'm happy. Except, it bugs me. I should be able to do the same thing with a series of INNER JOINS, but I've yet to come up with a solution which actually runs. Even with an intermediate table, I'm still not sure how to solve it and thus want to first solve it with this schema. Is there a SQL wizard in the house? EDIT: Just after posting this, I came up with the following. It works, but is it optimal? Is there a better way? (It seems that if Words is really big, we end up with two huge joins which are then reduced or with the optimizer figure this out?)
SELECT DISTINCT Words.Name FROM Words
INNER JOIN Letters AS L1 ON Words.Id=L1.Id
INNER JOIN Letters AS L2 ON Words.Id=L2.Id
WHERE L1.Letter='A' AND L2.Letter='C'
ORDER BY Words.Name COLLATE NOCASEI aslso got the following suggestion:
SELECT DISTINCT Words.Name FROM Words
INNER JOIN Letters AS L1 ON Words.Id=L1.Id AND L1.Letter='A'
INNER JOIN Letters AS L2 ON Words.Id=L2.Id AND L2.Letter='C'
ORDER BY Words.Name COLLATE NOCASEPersonally I use the self join, mainly because the intersect is a new fangled concept that has not penetrated my inertia. Some of the new stuff is just repackaging the the old stuff to be more palatable, linq to sql comes to mind. My TSQL seems to have ossified about a decade ago as that gets 95% of my work done. I'd hate to have to set up a distributed database these days.
Never underestimate the power of human stupidity RAH
-
For the sake of this question, I've massively reduced everything. I know there are alternate solutions, but I want to understand SQL better and solve this specific problem. (So please refrain from suggesting alternate schemas--I already have those; I really want to understand INTERSECT vs SELF INNER JOIN.) Given two tables with two columns each:
Table: Words
Id Name
1 One
2 Two
3 ThreeTable: Letters
Id Letter
1 A
1 B
1 C
2 A
2 D
3 A
3 CThe task is to find all the Ids in Words which match two letters from Letters. For example, I want to find all the rows in Words which match the letters A AND C from letters. Again, massive oversimplification. I can do the following (in SQLite):
SELECT DISTINCT Words.Name FROM Words JOIN Letters ON Words.Id=Letters.Id WHERE Letters.Letter='A'
INTERSECT
SELECT DISTINCT Words.Name FROM Words JOIN Letters ON Words.Id=Letters.Id WHERE Letters.Letter='C'
ORDER BY name COLLATE NOCASEThis returns One and Three. I'm happy. Except, it bugs me. I should be able to do the same thing with a series of INNER JOINS, but I've yet to come up with a solution which actually runs. Even with an intermediate table, I'm still not sure how to solve it and thus want to first solve it with this schema. Is there a SQL wizard in the house? EDIT: Just after posting this, I came up with the following. It works, but is it optimal? Is there a better way? (It seems that if Words is really big, we end up with two huge joins which are then reduced or with the optimizer figure this out?)
SELECT DISTINCT Words.Name FROM Words
INNER JOIN Letters AS L1 ON Words.Id=L1.Id
INNER JOIN Letters AS L2 ON Words.Id=L2.Id
WHERE L1.Letter='A' AND L2.Letter='C'
ORDER BY Words.Name COLLATE NOCASEI aslso got the following suggestion:
SELECT DISTINCT Words.Name FROM Words
INNER JOIN Letters AS L1 ON Words.Id=L1.Id AND L1.Letter='A'
INNER JOIN Letters AS L2 ON Words.Id=L2.Id AND L2.Letter='C'
ORDER BY Words.Name COLLATE NOCASEUse joins whenever it's possible. The DB will then be able to use indexes for the joining. Intersect is meant to be used when you have two resultsets with the same columns but different origins and no common indexing. Instead of a self join you can try this query:
SELECT W.name
FROM Words w JOIN Letters l ON w.ID = l.id
WHERE l.letter IN ('A','C')
GROUP BY W.name
HAVING Count(w.name) = 2A group by is generally faster than a distinct and it will never need more than one index seek, your self join might need two depending on what the optimizer comes up with. But it's a bit quirky as you need to have the letter count available. But you should consider it if you have large resultsets because it can be pipelined and therefore need much less memory.
Politicians are always realistically manoeuvering for the next election. They are obsolete as fundamental problem-solvers. Buckminster Fuller
-
For the sake of this question, I've massively reduced everything. I know there are alternate solutions, but I want to understand SQL better and solve this specific problem. (So please refrain from suggesting alternate schemas--I already have those; I really want to understand INTERSECT vs SELF INNER JOIN.) Given two tables with two columns each:
Table: Words
Id Name
1 One
2 Two
3 ThreeTable: Letters
Id Letter
1 A
1 B
1 C
2 A
2 D
3 A
3 CThe task is to find all the Ids in Words which match two letters from Letters. For example, I want to find all the rows in Words which match the letters A AND C from letters. Again, massive oversimplification. I can do the following (in SQLite):
SELECT DISTINCT Words.Name FROM Words JOIN Letters ON Words.Id=Letters.Id WHERE Letters.Letter='A'
INTERSECT
SELECT DISTINCT Words.Name FROM Words JOIN Letters ON Words.Id=Letters.Id WHERE Letters.Letter='C'
ORDER BY name COLLATE NOCASEThis returns One and Three. I'm happy. Except, it bugs me. I should be able to do the same thing with a series of INNER JOINS, but I've yet to come up with a solution which actually runs. Even with an intermediate table, I'm still not sure how to solve it and thus want to first solve it with this schema. Is there a SQL wizard in the house? EDIT: Just after posting this, I came up with the following. It works, but is it optimal? Is there a better way? (It seems that if Words is really big, we end up with two huge joins which are then reduced or with the optimizer figure this out?)
SELECT DISTINCT Words.Name FROM Words
INNER JOIN Letters AS L1 ON Words.Id=L1.Id
INNER JOIN Letters AS L2 ON Words.Id=L2.Id
WHERE L1.Letter='A' AND L2.Letter='C'
ORDER BY Words.Name COLLATE NOCASEI aslso got the following suggestion:
SELECT DISTINCT Words.Name FROM Words
INNER JOIN Letters AS L1 ON Words.Id=L1.Id AND L1.Letter='A'
INNER JOIN Letters AS L2 ON Words.Id=L2.Id AND L2.Letter='C'
ORDER BY Words.Name COLLATE NOCASEHow about using
Exists
?SELECT
Id,
Name
FROM
Words As W
WHERE
Exists
(
SELECT 1
FROM Letters As L
WHERE L.Id = W.Id
And L.Letter = 'A'
)
And
Exists
(
SELECT 1
FROM Letters As L
WHERE L.Id = W.Id
And L.Letter = 'C'
)The query optimizer will probably give you the same plan for this as the
INNER JOIN
option, but this makes your intention slightly clearer IMO.
"These people looked deep within my soul and assigned me a number based on the order in which I joined." - Homer