Generic collections

PozzaVecia

oky thank

BillWoodruff

harold aptroot wrote:

you're doing a linear search through a list.

Hi Harold, I am curious: do you know for a fact that using Linq 'ElementAt, or using the (not Linq) 'IndexOf on an Array, or List, internally performs a linear search ? In the case of 'ElementAt on a Dictionary, I am assuming there's some cost of some kind of internal conversion since Dictionary doesn't inherently support "sequential" indexing by integer. thanks, Bill

~ “This isn't right; this isn't even wrong." Wolfgang Pauli, commenting on a physics paper submitted for a journal

Lost User

The IndexOf of a List uses the IndexOf of Array, and that ones's a linear search (I checked it just be sure, but really there's no other choice anyway). What ElementAt does depends on the thing you're applying it to, it will use IList<T>'s indexer if that interface is implemented (Dictionary does not implement it, so ElementAt would scan through the enumerator). So in conclusion, that code was pretty inefficient.

BillWoodruff

Thanks for your reply, Harold, that is very interesting to know. It makes me wonder: if you were dealing with a really big generic List, or a huge Dictionary, if you could implement a much more efficient solution than the current operators, and, if so, why MS has not done that. I would have thought that Linq operators, particularly, were highly optimized with the latest algorithms: reminds me I should never assume anything about software tools. yours, Bill

~ “This isn't right; this isn't even wrong." Wolfgang Pauli, commenting on a physics paper submitted for a journal

Lost User

There aren't many options there.. (there are some, discussed below) I mean, say you have an array and you want to know the index of some value and you have an algorithm that does not potentially look at all elements, how can it know the thing it's trying to find isn't in the part that it hasn't looked at? Clearly it can't, so in the worst case you're always looking at all elements anyway. Leaving the "standard complexity model" behind for a moment, you can do a lot better than O(n): divide the array in m parts, search each part in parallel, then do a Min-Reduction phase on all the results. That's O(log m) in PRAM (where the assumption m = n is allowed), but PRAM is not a realistic model for present-day CPU computing. In the CPU world that same algorithm is still O(n) (because it's really O(n/m + log m) and m is a constant now), which doesn't necessarily have to be a bad thing, not all O(n)'s are created equally after all. But this isn't a particularly awesome algorithm: 1) it spends a lot of time just fetching things from memory, and doesn't do a whole lot with them. (of course the normal algorithm does that too, that it will be a worse bottleneck when multiple threads are doing it) 2) even if you abort the threads working on the higher ranges (or if you want any index, not just the lowest: all other threads) when you find the item, you will have wasted work, so even when it's faster it takes more resources. 3) it has a lot of overhead from all the thread business. In a quick test, this algorithms start pulling ahead on average for an array of a million integers, and then only when the item is not actually in the array (the worst case for the simple linear search). It almost reaches a speedup of 4x on my machine (which isn't that bad considering it's a quad core + hyperthreading) but each time an item actually is in the array, the multithreaded algorithm loses or almost loses, no matter how big I make the array (tested up to an array of 228 ints). So it's not really suitable as a default algorithm for libraries: it's terrible in the common case that the item is actually in the array.

BillWoodruff

Once again, thanks, for that very illuminating reply ! I'd say if you tested up to 228, that would cover most non-astronomical level scenarios ! Of more interest, and something I actually intend to explore, is what happens when have compound structures with KeyValue pairs, and both Key and Value Types are complex classes, not just ValueTypes. ... edit ... on second thought: if K and V are complex classes, I think it's just a case of comparing pointers in any IndexOf, or ElementAt, type operation, so: no difference than if K and V were .NET ReferenceTypes (?) ... Please come to northern Thailand (in late-October~early-January is best), and let me take you, and your family, elephant riding in the jungle :) yours, Bill

~ “This isn't right; this isn't even wrong." Wolfgang Pauli, commenting on a physics paper submitted for a journal

Lost User

BillWoodruff wrote:

Of more interest, and something I actually intend to explore, is what happens when have compound structures with KeyValue pairs, and both Key and Value Types are complex classes, not just ValueTypes.

It depends, it calls Equals(T other) when IEquatable<T> is implemented (in that type or a base class), regular Equals otherwise which might default to Object.Equals, in that case it uses Reference Equality for reference types and Bitwise Equality for value types (that's how the official documentation puts it), which could be restated as "always bitwise equality, so if you give it a reference it's bitwise equality on the reference itself" (that might help a person who thinks in pointers, it would probably confuse someone who thinks in objects). The code where it tests to see which case it's in and then makes an EqualityComparer uses a lot of reflection, by the way. Thanks for inviting me by the way, but I've found that jungles aren't my thing, so I don't think I'll do anything with that..