LEFT JOIN - GET ONLY ONE RESULT FROM RIGHT TABLE

polishprogrammer

I have a need to execute a query that pulls all data from one table regardless of whether matching data (matched by a foreign key) is present in a second table. This would be a classic LEFT JOIN query except that if a matching record is found in the RIGHT table (table2), then I need to pull only the latest such record from that table. In particular, I need to pull one field, called VE (maybe more in the future) from the RIGHT table, but only from the latest record in the RIGHT table that matches the data in the LEFT table. The basic scenario I have is an object described in the LEFT table and the results of a process performed on that object in the RIGHT table. There may be more than one process result for that object in the right table, but only the latest one matters to me for the purposes of this query. To solve this problem I used the following query based on the advice of someone from Code Project, but it didn't quite work out as planned. The ID in this query is an autonumber field that I'm using to retrieve the latest record. SELECT Details.*, Results.VE FROM Details LEFT JOIN [SELECT Results.VE, Results.SerialNumber, Max(Results.ID) From Results Group By Results.VE, Results.SerialNumber]. AS Results ON Details.SN = Results.SerialNumber); It worked for the most part, but there were 7 more records in this result table from this query than are present in the 'LEFT' table. It appears that the duplications were occuring in the subquery, such that if there was an object that had two (or more) process results with different VE values, then the duplication occurred. For other duplications, this query correctly retrieved only the last record. I appreciate any help. Thanks.

Wendelius

Most likely the reason is in your group by section. You group on two columns so you'll get as many rows as there are valid combinations present. But then on your join you use only one column (SerialNumber). If you want the latest record based on maximum ID and the maximum of id should be fetched based on serialnumber, it may be simpler to use correlated scalar at select portion (there may be typos):

SELECT Details.*,
(SELECT Results.VE
From Results r1
WHERE Details.SN = r1.SerialNumber
AND ID = (SELECT MAX(ID)
FROM Results r2
WHERE Details.SN = r1.SerialNumber)
FROM Details;

Hope this helps, Mika

The need to optimize rises from a bad design. My articles[^]

polishprogrammer

Thanks. Your explanation about the GROUP BY helped me to understand why I was getting some duplicates, but the solution does not work; the result set was so huge that I had to cancel the attempt to execute the query.

Wendelius

Few questions: - do you want to get as many rows in the output as there are rows in Details-table (no where conditions)? - how many rows there are in details table? - how many rows there are in Results table? - is SerialNumber indexed in Results-table? - is ID indexed in results table?

The need to optimize rises from a bad design. My articles[^]

polishprogrammer

Mika, 1. Ultimately, no, I want to retrieve rows from the Details table based on certain conditions, say date range or whether one of the fields in the table is equal to (OR LIKE) a certain parameter. But, I thought that retrieving all rows from the Details table (and no more than the total number of rows) was the simplest condition. 2. The sample database I have-and perhaps a small database at that-has 10080 rows in the Details table. 3. 10500 rows. Nearly all objects in the Details table have undergone the process whose results are recorded in the results table (sometimes more than once). 4. No 5. Yes, ID is indexed in the results table. Thanks.

Wendelius

Okay, So the performance is affected because you fetch so many records (not a real world situation, but although a good test case). Anyway, since the scalar is correlated, it's executed as many times as there are rows in the result (in this case 10080 times). If the result is limited, it will greately affect performance. Another thing. Try adding a new index with for column SerialNumber (or SerialNumber, ID). Leave the existing ID index as it is. I would guess that you'll see quite different performance. Also you could have an index on Details.SN if not already indexed. Mika

The need to optimize rises from a bad design. My articles[^]

polishprogrammer

Thanks again, Mika, for your help. For your information, this is the solution-suggested by someone at another site - that worked: SELECT Details.*, r1.VE FROM Details LEFT JOIN [SELECT r1.* FROM Results r1 INNER JOIN (SELECT SerialNumber, Max(ID) As maxID FROM Results GROUP BY SerialNumber) AS r2 ON r1.SerialNumber = r2.SerialNumber AND r1.ID = r2.maxID]. AS r1 ON (Details.SN = r1.SerialNumber); The actual query I'm using is a bit more complex than this as it involves several more conditions in the LEFT JOIN statement and some more or less simple WHERE queries to limit the information returned from the Details table, but the basic idea of this query does solve the problem I had described. Thanks again.