Linq TO SQL DistinctBy Question
-
I hope this is clear. If not, I can explain further. I have a table that [looks like this](https://1drv.ms/u/s!AlkRTpT49yCMmgO6gF-1gD-8UnlR) I will be running this Linq to SQL query repeatedly
var maxRev = 2;
var jobSequenceSheets = (from jss in dc.JobSequenceSheets
where jss.JobId == jobId &&
jss.RevisionNumber == maxRev
select jss).OrderBy(x => x.Plan)
.ThenBy(x => x.Elevation)
.DistinctBy(c => new { c.Plan, c.Elevation})
.ToList();So, of you look at the image, given RevisionNumber 2, I should get back only two records with Plan & Elevations of 1A and 2A. There are two rows for 2A because of the Lot. I don't care about the lot. All I care about for a new feature I'm working on is that I get back a distinct set of Plan/Elevations with the SAME ID'S EACH TIME IT'S RUN. Since there are two rows for 2A, given this query, I should get back only 2 rows (not 3). Can I expect to get back the same ID's each time? Does DistinctBy use the FIRST matching row it finds? I would expect this query to give me back rows 22607 and 22608 each run. Thanks
If it's not broken, fix it until it is. Everything makes sense in someone's mind. Ya can't fix stupid.
-
I hope this is clear. If not, I can explain further. I have a table that [looks like this](https://1drv.ms/u/s!AlkRTpT49yCMmgO6gF-1gD-8UnlR) I will be running this Linq to SQL query repeatedly
var maxRev = 2;
var jobSequenceSheets = (from jss in dc.JobSequenceSheets
where jss.JobId == jobId &&
jss.RevisionNumber == maxRev
select jss).OrderBy(x => x.Plan)
.ThenBy(x => x.Elevation)
.DistinctBy(c => new { c.Plan, c.Elevation})
.ToList();So, of you look at the image, given RevisionNumber 2, I should get back only two records with Plan & Elevations of 1A and 2A. There are two rows for 2A because of the Lot. I don't care about the lot. All I care about for a new feature I'm working on is that I get back a distinct set of Plan/Elevations with the SAME ID'S EACH TIME IT'S RUN. Since there are two rows for 2A, given this query, I should get back only 2 rows (not 3). Can I expect to get back the same ID's each time? Does DistinctBy use the FIRST matching row it finds? I would expect this query to give me back rows 22607 and 22608 each run. Thanks
If it's not broken, fix it until it is. Everything makes sense in someone's mind. Ya can't fix stupid.
Since you are creating a new instance for each item you check in the comparer, it will return every instance as a new, distinct element, since it will compare instance references and by definition
new
returns different references each time it is called. Have a look here: C# – DistinctBy extension[^] - it explains the extension method quite well."I have no idea what I did, but I'm taking full credit for it." - ThisOldTony "Common sense is so rare these days, it should be classified as a super power" - Random T-shirt AntiTwitter: @DalekDave is now a follower!
-
Since you are creating a new instance for each item you check in the comparer, it will return every instance as a new, distinct element, since it will compare instance references and by definition
new
returns different references each time it is called. Have a look here: C# – DistinctBy extension[^] - it explains the extension method quite well."I have no idea what I did, but I'm taking full credit for it." - ThisOldTony "Common sense is so rare these days, it should be classified as a super power" - Random T-shirt AntiTwitter: @DalekDave is now a follower!
Not quite. An anonymous type uses value equality, not reference equality:
Anonymous Types | Microsoft Docs[^]:
Because the Equals and GetHashCode methods on anonymous types are defined in terms of the Equals and GetHashCode methods of the properties, two instances of the same anonymous type are equal only if all their properties are equal.
Also, the
DistinctBy
operator which was added in .NET 6 uses a different approach from the blog you linked to: runtime/Distinct.cs at ebba1d4acb7abea5ba15e1f7f69d1d1311465d16 · dotnet/runtime · GitHub[^] And the answer will also depend on whether theDistinctBy
method gets translated to SQL, or evaluated on the client.
"These people looked deep within my soul and assigned me a number based on the order in which I joined." - Homer
-
I hope this is clear. If not, I can explain further. I have a table that [looks like this](https://1drv.ms/u/s!AlkRTpT49yCMmgO6gF-1gD-8UnlR) I will be running this Linq to SQL query repeatedly
var maxRev = 2;
var jobSequenceSheets = (from jss in dc.JobSequenceSheets
where jss.JobId == jobId &&
jss.RevisionNumber == maxRev
select jss).OrderBy(x => x.Plan)
.ThenBy(x => x.Elevation)
.DistinctBy(c => new { c.Plan, c.Elevation})
.ToList();So, of you look at the image, given RevisionNumber 2, I should get back only two records with Plan & Elevations of 1A and 2A. There are two rows for 2A because of the Lot. I don't care about the lot. All I care about for a new feature I'm working on is that I get back a distinct set of Plan/Elevations with the SAME ID'S EACH TIME IT'S RUN. Since there are two rows for 2A, given this query, I should get back only 2 rows (not 3). Can I expect to get back the same ID's each time? Does DistinctBy use the FIRST matching row it finds? I would expect this query to give me back rows 22607 and 22608 each run. Thanks
If it's not broken, fix it until it is. Everything makes sense in someone's mind. Ya can't fix stupid.
It depends. Are you using the
DistinctBy
method added in .NET 6, or a different implementation? And does the ORM you're using translatedDistinctBy
to SQL, or does it evaluate it on the client? If it evaluates it on the client, and you're using the .NET 6 method or something equivalent, then technically it will return the first item it encounters within each group. However, this is an implementation detail, and you cannot rely on it. And since you don't order by the ID, you can't guarantee that the database will return the records in any ID-related order. If you always want the lowest ID, then you need to be explicit:var jobSequenceSheets = dc.JobSequenceSheets
.Where(jss => jss.JobId == jobId)
.Where(jss => jss.RevisionNumber == maxRev)
.GroupBy(jss => new { jss.Plan, jss.Elevation }, (_, items) => items.OrderBy(jss => jss.ID).First())
.OrderBy(jss => jss.Plan).ThenBy(jss => jss.Elevation)
.ToList();NB: Depending on your ORM, you might need to stick an
.AsEnumerable()
between the second.Where(...)
and the.GroupBy(...)
to force client evaluation of the grouping.
"These people looked deep within my soul and assigned me a number based on the order in which I joined." - Homer