Prioritized joining - Updated
-
I'm looking for a way of joining two tables using a function returning a ranking value where the rows joined would be decided on the returning value of a function. Assume I'm joining table X and Table Y using a function f(X,Y) If the values returned by the function from a cartesian join would be:
X1
X2
X3
Y1
1
5
9
Y2
7
3
8
Y3
4
2
6
I would want this "excluding functional join" to return: Y1,X3 Y2,X1 Y3,X2 Is this at all possible without using procedural code? <edit>What I want is something similar to the Gale-Shapley[^] algorithm but using a special text search instead of simple preferences. It's easy enough to do in procedural code, but I want to avoid an RBAR solution</edit>
Wrong is evil and must be defeated. - Jeff Ello
-
I'm looking for a way of joining two tables using a function returning a ranking value where the rows joined would be decided on the returning value of a function. Assume I'm joining table X and Table Y using a function f(X,Y) If the values returned by the function from a cartesian join would be:
X1
X2
X3
Y1
1
5
9
Y2
7
3
8
Y3
4
2
6
I would want this "excluding functional join" to return: Y1,X3 Y2,X1 Y3,X2 Is this at all possible without using procedural code? <edit>What I want is something similar to the Gale-Shapley[^] algorithm but using a special text search instead of simple preferences. It's easy enough to do in procedural code, but I want to avoid an RBAR solution</edit>
Wrong is evil and must be defeated. - Jeff Ello
Jörgen Andersson wrote:
without using procedural code
I do doubt it. One way might be to create a separate view that generates the ranking (can't see the PK field) and use the view in the join.
Never underestimate the power of human stupidity RAH
-
Jörgen Andersson wrote:
without using procedural code
I do doubt it. One way might be to create a separate view that generates the ranking (can't see the PK field) and use the view in the join.
Never underestimate the power of human stupidity RAH
Mycroft Holmes wrote:
can't see the PK field
That's because they're of no use in this case. I'm trying to map two tables in two different databases on their description fields, and some descriptions are basically the same but with added information for some rows, and those are the big problem. I'm going to try to do it using a recursive CTE.
Wrong is evil and must be defeated. - Jeff Ello
-
I'm looking for a way of joining two tables using a function returning a ranking value where the rows joined would be decided on the returning value of a function. Assume I'm joining table X and Table Y using a function f(X,Y) If the values returned by the function from a cartesian join would be:
X1
X2
X3
Y1
1
5
9
Y2
7
3
8
Y3
4
2
6
I would want this "excluding functional join" to return: Y1,X3 Y2,X1 Y3,X2 Is this at all possible without using procedural code? <edit>What I want is something similar to the Gale-Shapley[^] algorithm but using a special text search instead of simple preferences. It's easy enough to do in procedural code, but I want to avoid an RBAR solution</edit>
Wrong is evil and must be defeated. - Jeff Ello
How about something like this:
- Cross-join the two tables;
- Generate the ranking value for each pair;
- Generate a
ROW_NUMBER
, partitioned by the primary key of Y, and ordered by the ranking value (descending); - Select the rows where the row number is
1
;
WITH cteRanked As
(
SELECT
X.PK As XPK,
X.OtherColumnsFromX,
Y.PK As YPK,
Y.OtherColumnsFromY,
RankingFunction(X.Value, Y.Value) As R
FROM
X CROSS JOIN Y
),
cteOrdered As
(
SELECT
XPK,
OtherColumnsFromX,
YPK,
OtherColumnsFromY,
ROW_NUMBER() OVER (PARTITION BY YPK ORDER BY R DESC) As RN
FROM
cteRanked
)
SELECT
XPK,
OtherColumnsFromX,
YPK,
OtherColumnsFromY
FROM
cteOrdered
WHERE
RN = 1
;
"These people looked deep within my soul and assigned me a number based on the order in which I joined." - Homer
-
Mycroft Holmes wrote:
can't see the PK field
That's because they're of no use in this case. I'm trying to map two tables in two different databases on their description fields, and some descriptions are basically the same but with added information for some rows, and those are the big problem. I'm going to try to do it using a recursive CTE.
Wrong is evil and must be defeated. - Jeff Ello
Jörgen Andersson wrote:
m going to try to do it using a recursive CTE.
why am I not surprised by that. Your cte skills seem to be quite adequate. Let us know if you achieve a solution
Never underestimate the power of human stupidity RAH
-
How about something like this:
- Cross-join the two tables;
- Generate the ranking value for each pair;
- Generate a
ROW_NUMBER
, partitioned by the primary key of Y, and ordered by the ranking value (descending); - Select the rows where the row number is
1
;
WITH cteRanked As
(
SELECT
X.PK As XPK,
X.OtherColumnsFromX,
Y.PK As YPK,
Y.OtherColumnsFromY,
RankingFunction(X.Value, Y.Value) As R
FROM
X CROSS JOIN Y
),
cteOrdered As
(
SELECT
XPK,
OtherColumnsFromX,
YPK,
OtherColumnsFromY,
ROW_NUMBER() OVER (PARTITION BY YPK ORDER BY R DESC) As RN
FROM
cteRanked
)
SELECT
XPK,
OtherColumnsFromX,
YPK,
OtherColumnsFromY
FROM
cteOrdered
WHERE
RN = 1
;
"These people looked deep within my soul and assigned me a number based on the order in which I joined." - Homer
Not bad, but it's not going all the way. I need to partition it by both X and Y.
Wrong is evil and must be defeated. - Jeff Ello
-
Jörgen Andersson wrote:
m going to try to do it using a recursive CTE.
why am I not surprised by that. Your cte skills seem to be quite adequate. Let us know if you achieve a solution
Never underestimate the power of human stupidity RAH
I believe I have found one, the proper test will be on Monday. But as it is done using a recursive CTE, it's not the most effective one. I'm going to have another look at Richards version tonight.
Wrong is evil and must be defeated. - Jeff Ello
-
Not bad, but it's not going all the way. I need to partition it by both X and Y.
Wrong is evil and must be defeated. - Jeff Ello
I must be missing something - surely, if you partitioned by both X and Y, you'd end up with the Cartesian product of the two tables? :confused:
"These people looked deep within my soul and assigned me a number based on the order in which I joined." - Homer
-
I must be missing something - surely, if you partitioned by both X and Y, you'd end up with the Cartesian product of the two tables? :confused:
"These people looked deep within my soul and assigned me a number based on the order in which I joined." - Homer
Well, that, is exactly my problem! :)
Wrong is evil and must be defeated. - Jeff Ello
-
Jörgen Andersson wrote:
m going to try to do it using a recursive CTE.
why am I not surprised by that. Your cte skills seem to be quite adequate. Let us know if you achieve a solution
Never underestimate the power of human stupidity RAH
In the end I ended up using a variant of Richards suggestion.
Wrong is evil and must be defeated. - Jeff Ello
-
How about something like this:
- Cross-join the two tables;
- Generate the ranking value for each pair;
- Generate a
ROW_NUMBER
, partitioned by the primary key of Y, and ordered by the ranking value (descending); - Select the rows where the row number is
1
;
WITH cteRanked As
(
SELECT
X.PK As XPK,
X.OtherColumnsFromX,
Y.PK As YPK,
Y.OtherColumnsFromY,
RankingFunction(X.Value, Y.Value) As R
FROM
X CROSS JOIN Y
),
cteOrdered As
(
SELECT
XPK,
OtherColumnsFromX,
YPK,
OtherColumnsFromY,
ROW_NUMBER() OVER (PARTITION BY YPK ORDER BY R DESC) As RN
FROM
cteRanked
)
SELECT
XPK,
OtherColumnsFromX,
YPK,
OtherColumnsFromY
FROM
cteOrdered
WHERE
RN = 1
;
"These people looked deep within my soul and assigned me a number based on the order in which I joined." - Homer
The data turned out to be a lot less ordered or rather more asymmetrical than expected (just like I should've expected :rolleyes:) And the mapping between the databases also turned out to be important in only one direction. So I ended up using a variant of your solution, but using RANK instead of ROW_NUMBER. Now I'm working on refining the ranking function for whenever I get duplicate matches
Wrong is evil and must be defeated. - Jeff Ello