Linq and "select new"

Ian Shlasko

Ok, this one was driving me crazy for two hours... I'm only just getting into linq (Not linq-to-sql, just the query syntax), and let's just say I learned something new about how it works... Here's a trivialized example:

public class TestItem
{
public int Value;
}

public void Run() {
List<int> items = new List<int>();
for (int idx = 0; idx < 100; idx++)
items.Add(idx);

var linqed = from x in items
select new TestItem() { Value = x };

TestItem a = linqed.First();
TestItem b = linqed.First();

bool isSameObject = (a == b);
}

So TestItem is just a trivial object (Need an actual class for this to work)... Here's what we're doing... 1) Create a list of numbers from 0 to 99 (Just for something to generate from) 2) Using a linq select, get an IEnumerable of TestItems with those values. The syntax is pretty easy to read at this level... For each item in the list, we create a TestItem for it. Now, the First() just gives us the first element... Nothing special. So what's the value of isSameObject? The obvious answer is TRUE! Of course it's true! TestItem is a reference type, so if we grab it twice, we have two references to the same object, and it's the same! As I'm sure you were expecting, the obvious answer is not the right one. APPARENTLY a linq query with a "select new" is not the same as just running a foreach loop... It doesn't go through and create 100 TestItems. No, it only runs the "select new" clause when we access the item... So basically, the entire linq statement is a dynamic function. Every time we call First(), it's going in and running the "select new" and returning a NEW INSTANCE of TestItem. Pop a constructor and a breakpoint in TestItem, and you'll see it run only twice. The solution, since I wanted the "select new" to run immediately (Since I operate on the resulting objects), was to wrap the entire linq statement in a ToList(), which gives me something consistent to work with... Learn something new every day...

Proud to have finally moved to the A-Ark. Which one are you in? Developer, Author (Guardians of Xen)

Jorgen Sigvardsson

Despite the easy to read and write syntax of Linq, it's very very complex. Linq is ideal in functional languages, where side effects are uncommon or very hard to produce. In languages that depend on side effects (such as C#), these things can really slap you in the face. Nevertheless, when used properly, Linq is an awesome tool!

-- Kein Mitleid Für Die Mehrheit

Super Lloyd

The easy answer is: "of course it's false" ;P Links query object are evaluated on demand. And they are re-evaluated everytime you demand them. But they are only evaluated as far as you demand them. If you want to avoid continuous re-evaluation, how about this nice extension method?

public static class Enumerable2
{
	public static IEnumerable<T> Cache<T>(this IEnumerable<T> e) { return new CachedEnumerable<T>(e); }
}

public class CachedEnumerable<T> : IEnumerable<T>
{
	List<T> cache;
	IEnumerable<T> source;

	public CachedEnumerable(IEnumerable<T> e)
	{
		source = e;
	}

	public IEnumerator<T> GetEnumerator()
	{
		if (cache == null)
			cache = new List<T>(source);
		return cache.GetEnumerator();
	}
	System.Collections.IEnumerator System.Collections.IEnumerable.GetEnumerator() { return GetEnumerator(); }
}

A train station is where the train stops. A bus station is where the bus stops. On my desk, I have a work station.... _________________________________________________________ My programs never have bugs, they just develop random features.

Ian Shlasko

Well, yeah, it's obvious NOW, but I don't remember reading anywhere in the documentation, when I was teaching myself the syntax, that it behaved that way. Thanks for the code, but the way I'm using it, as part of a calculation tree in a reporting tool, it's just as easy to do a ToArray() or ToList(), both of which process the whole thing immediately. On the other hand, there are a few situations when an on-demand load might be useful... Gotta love how easy it is to write extension methods.

Proud to have finally moved to the A-Ark. Which one are you in? Developer, Author (Guardians of Xen)

S Senthil Kumar

Well, if you knew that under the hood, the select clause is actually IEnumerable Enumerable.Select, you could have figured out the problem earlier. IEnumerable, introduced in .NET 2.0, was primarily targeted for lazy loading scenarios.

Ian Shlasko wrote:

as part of a calculation tree in a reporting tool, it's just as easy to do a ToArray() or ToList(), both of which process the whole thing immediately.

What I've found is that with some tweaking, the code consuming the result of the select query can usually (but not always) be modified to work with it directly, instead of transforming it to a list or an array - and it usually makes the code more readable.

Regards Senthil _____________________________ My Home Page |My Blog | My Articles | My Flickr | WinMacro

Paulo Zemek

Even reevaluating, if your object implement Equals, you will not have any problem.

Ian Shlasko

Actually the problem was that I'm using the resulting objects, running them through other routines to MODIFY them, and continuing to work with the results... So it went something like this... 1) Run the linq query, generating a list of portfolio position objects 2) Post-process them in place, adding some data from other sources 3) Run the same objects through another routine, verifying the numbers and adding-- Wait a second... What happened to the data from step two?!?! So, in the context of the program, it was pretty mysterious... Obviously, what was happening was that step two was looping through the returned IEnumerable and making changes, and then step three was going back to the IEnumerable and reading entirely new objects instead of the same ones from step two. Actually it's a good thing I DIDN'T implement Equals(), or it would have taken longer to track down the problem :)

Proud to have finally moved to the A-Ark. Which one are you in? Developer, Author (Guardians of Xen)

GibbleCH

It's called deferred execution :)