ConcurrentQueue & FirstOrDefault() and search speed.
-
It is seems to me a very bad practice to use a FIFO/LIFO oriented collection to anything else than push/pop...
Skipper: We'll fix it. Alex: Fix it? How you gonna fix this? Skipper: Grit, spit and a whole lotta duct tape.
Kornfeld Eliyahu Peter wrote:
seems to me a very bad practice to use a FIFO/LIFO oriented collection to anything else than push/pop
Yes, a good observation. I had initially solved this with a ConcurrentDictionary. Someone else suggested the Queue to me. 1. I have a memory collection which continues to get re-loaded from the database on an interval. 2. I only want to load new items into the collection 3. However, when I get an item, I just want to pop off the oldest item from the collection. This lead me to two different ways to do this. 1. Queue - load up the queue on interval - insure an item is never added more than once. -- later, simply pop off the oldest item (First-In). The getter code is a bit more simple (obvious / self-documenting) with the Queue. 2. Dictionary - load up Dictionary on interval - TryAdd() handles this for me. As long as I load the dictionary from the database in reverse order (Stored Proc returns oldest first) all I have to do is call .First() on Dictionary when I want the oldest item. The Queue obviously gives you the First added item automatically. I've now done detailed testing and using the ConcurrentDictionary is much less CPU intensive, but might be slightly more RAM hungry.
-
Not particularly surprising, given the amount of work it's doing. :) Reference Source: ConcurrentQueue.cs[^] It looks like it's implemented as a singly linked list of small arrays (32 items each). Searching a linked list for the last item, or for an item that doesn't exist, is always going to take longer than finding an item close to the start. It also seems that the order of searches has some effect. Try searching for 1st followed by 31st; then try 31st followed by 1st.
"These people looked deep within my soul and assigned me a number based on the order in which I joined." - Homer
Very good confirming information on that. Thanks a lot.
Richard Deeming wrote:
It also seems that the order of searches has some effect. Try searching for 1st followed by 31st; then try 31st followed by 1s
I actually noticed that too as I tested, but my original post became so long I decided not to mention more. :)
-
Kornfeld Eliyahu Peter wrote:
seems to me a very bad practice to use a FIFO/LIFO oriented collection to anything else than push/pop
Yes, a good observation. I had initially solved this with a ConcurrentDictionary. Someone else suggested the Queue to me. 1. I have a memory collection which continues to get re-loaded from the database on an interval. 2. I only want to load new items into the collection 3. However, when I get an item, I just want to pop off the oldest item from the collection. This lead me to two different ways to do this. 1. Queue - load up the queue on interval - insure an item is never added more than once. -- later, simply pop off the oldest item (First-In). The getter code is a bit more simple (obvious / self-documenting) with the Queue. 2. Dictionary - load up Dictionary on interval - TryAdd() handles this for me. As long as I load the dictionary from the database in reverse order (Stored Proc returns oldest first) all I have to do is call .First() on Dictionary when I want the oldest item. The Queue obviously gives you the First added item automatically. I've now done detailed testing and using the ConcurrentDictionary is much less CPU intensive, but might be slightly more RAM hungry.
raddevus wrote:
all I have to do is call .First() on Dictionary when I want the oldest item.
That's an assumption that's going to bite you in the future. The dictionary classes don't guarantee that the enumerator will return items in any specific order. The order could change with a framework update; or it could change when your dictionary exceeds a certain size.
"These people looked deep within my soul and assigned me a number based on the order in which I joined." - Homer
-
raddevus wrote:
all I have to do is call .First() on Dictionary when I want the oldest item.
That's an assumption that's going to bite you in the future. The dictionary classes don't guarantee that the enumerator will return items in any specific order. The order could change with a framework update; or it could change when your dictionary exceeds a certain size.
"These people looked deep within my soul and assigned me a number based on the order in which I joined." - Homer
Ah, interesting. Very good info. So back to the Queue, it is. There are always trade-offs. I was sorting the Dictionary first via waitingFiles.OrderByDescending(of => of.Value.FileCreated).First(); But that is really intensive because from what I can tell those LINQ objects are instantiated to search for the one that matches. So, I was hoping (and it is really fast and nice) that since I load the data into the Dictionary in ascending order (oldest files first) that I could count on just calling First(). I mean it does work now. I was actually testing the order that the files are then processed. However, I understand that it is not guaranteed. Interesting and why I'm trying not to make assumptions about which is better ( Dictionary or Queue ) in this instance.
-
Kornfeld Eliyahu Peter wrote:
seems to me a very bad practice to use a FIFO/LIFO oriented collection to anything else than push/pop
Yes, a good observation. I had initially solved this with a ConcurrentDictionary. Someone else suggested the Queue to me. 1. I have a memory collection which continues to get re-loaded from the database on an interval. 2. I only want to load new items into the collection 3. However, when I get an item, I just want to pop off the oldest item from the collection. This lead me to two different ways to do this. 1. Queue - load up the queue on interval - insure an item is never added more than once. -- later, simply pop off the oldest item (First-In). The getter code is a bit more simple (obvious / self-documenting) with the Queue. 2. Dictionary - load up Dictionary on interval - TryAdd() handles this for me. As long as I load the dictionary from the database in reverse order (Stored Proc returns oldest first) all I have to do is call .First() on Dictionary when I want the oldest item. The Queue obviously gives you the First added item automatically. I've now done detailed testing and using the ConcurrentDictionary is much less CPU intensive, but might be slightly more RAM hungry.
Question, do you need to check all or any of the elements in the list, or is it enough to check the last one? If so you would do better off with a linked list. There are no concurrent linked lists though. Apparently it was around in the beta versions of dotNet 4.0 but it was left out from the release version as there were no advantages over using a normal linked list with locking. But a simple lock would still be faster since you don't need to enumerate the collection.
Wrong is evil and must be defeated. - Jeff Ello
-
Question, do you need to check all or any of the elements in the list, or is it enough to check the last one? If so you would do better off with a linked list. There are no concurrent linked lists though. Apparently it was around in the beta versions of dotNet 4.0 but it was left out from the release version as there were no advantages over using a normal linked list with locking. But a simple lock would still be faster since you don't need to enumerate the collection.
Wrong is evil and must be defeated. - Jeff Ello
Thanks for chiming in on the conversation. The challenge I have here is that I need to insure the item is not currently anywhere in the collection -- an item may have been loaded at any location. There are probably other ways around this but just attempting to do it the easiest, yet most correct way which uses the least resources. The collection is used in two ways : 1. caller will request the oldest unique item 2. collection will periodically load newer items into itself The interesting thing about this is that the Queue makes on of those requirements easy and the Dictionary makes the other requirement easy. The first one is easy with the Queue since you just get the first in item. The second one is difficult with a Queue (there is no key) so you have to determine that the object is not already contained in the Queue for every item, each time the items are retrieved from db. The second one is easy with a Dictionary since you can do TryAdd() with your unique key and it fails fast if the item has already been added). However, the first one is more difficult with a Dictionary because the items are not (necessarily) ordered in a significant way and we need to get the oldest item. It's an interesting problem.
-
Thanks for chiming in on the conversation. The challenge I have here is that I need to insure the item is not currently anywhere in the collection -- an item may have been loaded at any location. There are probably other ways around this but just attempting to do it the easiest, yet most correct way which uses the least resources. The collection is used in two ways : 1. caller will request the oldest unique item 2. collection will periodically load newer items into itself The interesting thing about this is that the Queue makes on of those requirements easy and the Dictionary makes the other requirement easy. The first one is easy with the Queue since you just get the first in item. The second one is difficult with a Queue (there is no key) so you have to determine that the object is not already contained in the Queue for every item, each time the items are retrieved from db. The second one is easy with a Dictionary since you can do TryAdd() with your unique key and it fails fast if the item has already been added). However, the first one is more difficult with a Dictionary because the items are not (necessarily) ordered in a significant way and we need to get the oldest item. It's an interesting problem.
If you can afford the extra memory usage, the simplest option would probably be a custom class to store the data twice - once in a
HashSet<T>
, and once in aQueue<T>
. You'd obviously need to add your own locking code to ensure it was thread-safe. Off the top of my head, something like this might work:public sealed class UniqueQueue<T> : IEnumerable<T>, IDisposable
{
private readonly Queue<T> _queue = new Queue<T>();
private readonly HashSet<T> _set = new HashSet<T>();
private readonly ReaderWriterLockSlim _lock = new ReaderWriterLockSlim();public int Count { get { bool lockTaken = false; try { lockTaken = \_lock.TryEnterReadLock(Timeout.Infinite); if (!lockTaken) throw new InvalidOperationException(); return \_queue.Count; } finally { if (lockTaken) { \_lock.ExitReadLock(); } } } } public void Dispose() { \_lock.Dispose(); } public T Dequeue() { bool lockTaken = false; try { lockTaken = \_lock.TryEnterWriteLock(Timeout.Infinite); if (!lockTaken) throw new InvalidOperationException(); T oldestItem = \_queue.Dequeue(); \_set.Remove(oldestItem); return oldestItem; } finally { if (lockTaken) { \_lock.ExitWriteLock(); } } } public void AddRange(IEnumerable<T> itemsToAdd) { if (itemsToAdd == null) throw new ArgumentNullException(nameof(itemsToAdd)); bool lockTaken = false; try { lockTaken = \_lock.TryEnterWriteLock(Timeout.Infinite); if (!lockTaken) throw new InvalidOperationException(); foreach (T item in itemsToAdd) { if (\_set.Add(item)) { \_queue.Enqueue(item); } } } finally { if (lockTaken) { \_lock.ExitWriteLock(); } } } public UniqueQueueEnumerator GetEnumerator() {
-
Thanks for chiming in on the conversation. The challenge I have here is that I need to insure the item is not currently anywhere in the collection -- an item may have been loaded at any location. There are probably other ways around this but just attempting to do it the easiest, yet most correct way which uses the least resources. The collection is used in two ways : 1. caller will request the oldest unique item 2. collection will periodically load newer items into itself The interesting thing about this is that the Queue makes on of those requirements easy and the Dictionary makes the other requirement easy. The first one is easy with the Queue since you just get the first in item. The second one is difficult with a Queue (there is no key) so you have to determine that the object is not already contained in the Queue for every item, each time the items are retrieved from db. The second one is easy with a Dictionary since you can do TryAdd() with your unique key and it fails fast if the item has already been added). However, the first one is more difficult with a Dictionary because the items are not (necessarily) ordered in a significant way and we need to get the oldest item. It's an interesting problem.
I had a feeling it wouldn't be that simple. Richards suggestion would've been my second one, minus the code sample though. (impressive that!) Second question though, you wrote in another post that you load the objects from a database. Are the objects in the database not unique? How do you ensure the uniqueness if you have popped an object from the list? Or is the uniqueness coming from the identity column in the database? If so you could still use the linked list. Using a paged and sorted query.
Wrong is evil and must be defeated. - Jeff Ello
-
If you can afford the extra memory usage, the simplest option would probably be a custom class to store the data twice - once in a
HashSet<T>
, and once in aQueue<T>
. You'd obviously need to add your own locking code to ensure it was thread-safe. Off the top of my head, something like this might work:public sealed class UniqueQueue<T> : IEnumerable<T>, IDisposable
{
private readonly Queue<T> _queue = new Queue<T>();
private readonly HashSet<T> _set = new HashSet<T>();
private readonly ReaderWriterLockSlim _lock = new ReaderWriterLockSlim();public int Count { get { bool lockTaken = false; try { lockTaken = \_lock.TryEnterReadLock(Timeout.Infinite); if (!lockTaken) throw new InvalidOperationException(); return \_queue.Count; } finally { if (lockTaken) { \_lock.ExitReadLock(); } } } } public void Dispose() { \_lock.Dispose(); } public T Dequeue() { bool lockTaken = false; try { lockTaken = \_lock.TryEnterWriteLock(Timeout.Infinite); if (!lockTaken) throw new InvalidOperationException(); T oldestItem = \_queue.Dequeue(); \_set.Remove(oldestItem); return oldestItem; } finally { if (lockTaken) { \_lock.ExitWriteLock(); } } } public void AddRange(IEnumerable<T> itemsToAdd) { if (itemsToAdd == null) throw new ArgumentNullException(nameof(itemsToAdd)); bool lockTaken = false; try { lockTaken = \_lock.TryEnterWriteLock(Timeout.Infinite); if (!lockTaken) throw new InvalidOperationException(); foreach (T item in itemsToAdd) { if (\_set.Add(item)) { \_queue.Enqueue(item); } } } finally { if (lockTaken) { \_lock.ExitWriteLock(); } } } public UniqueQueueEnumerator GetEnumerator() {
This is a very good suggestion and I saw something very similar at StackOverflow. But, as you mentioned I was attempting to keep the memory size low too and it is interesting that this does store the objects in two collections. Also, I am actually using the ConcurrentQueue and ConcurrentDictionary which are thread-protected automatically so that would solve that part of the problem. Thanks for the input and discussion.
-
This is a very good suggestion and I saw something very similar at StackOverflow. But, as you mentioned I was attempting to keep the memory size low too and it is interesting that this does store the objects in two collections. Also, I am actually using the ConcurrentQueue and ConcurrentDictionary which are thread-protected automatically so that would solve that part of the problem. Thanks for the input and discussion.
raddevus wrote:
I am actually using the ConcurrentQueue and ConcurrentDictionary which are thread-protected automatically
Ish. :) Modifications to one of those collections will automatically be thread-safe. But if you're trying to read or update the two in a single atomic operation, you'll still need your own locking mechanism.
"These people looked deep within my soul and assigned me a number based on the order in which I joined." - Homer
-
You probably all knew this, but I needed a definitive answer. I'm using a ConcurrentQueue which is a FIFO collection. I needed to know that when I went to determine if the queue already had the item, if it would return faster when the item was found at the top (next item out) versus found at the bottom (last out position). I assumed, but wasn't sure. This code gives a good definitive answer.
ConcurrentQueue mainList = new ConcurrentQueue(); Console.WriteLine("Adding items..."); for (int x = 1; x < 10000000;x++){ mainList.Enqueue(x); } Console.WriteLine("DONE adding items..."); Stopwatch sw = new Stopwatch(); Console.WriteLine("Searching for FIRST item..."); sw.Start(); mainList.FirstOrDefault(item => item == 1); sw.Stop(); Console.WriteLine ("{0}ms",sw.ElapsedMilliseconds); sw = new StopWatch(); Console.WriteLine("Searching for LAST item..."); sw.Start(); mainList.FirstOrDefault(item => item == 9999999); sw.Stop(); Console.WriteLine ("{0}ms",sw.ElapsedMilliseconds);
Output Adding items... DONE adding items... Searching for FIRST item... 0ms Searching for LAST item... 99ms The consistent result is around 100ms on my machine. EDIT This addition seems to bear expected results too:
for (int z = 9999999; z >= 1;z--){
if (z % 1000000 == 0) {
sw = new Stopwatch();
sw.Start();
mainList.FirstOrDefault(item => item == z);
sw.Stop();
Console.WriteLine ("{0} : {1}ms",z, sw.ElapsedMilliseconds);
}
}Output for this section is: 9000000 : 99ms 8000000 : 89ms 7000000 : 78ms 6000000 : 71ms 5000000 : 55ms 4000000 : 44ms 3000000 : 33ms 2000000 : 21ms 1000000 : 10ms Interesting that on my machine it is about 10ms per 1,000,000 items. EDIT 2 But, wait...there's more! What about for objects? I'm creating a queue of anonymous objects via dynamic -- basically cuz I'm lazy. Then I test to see how long it takes to find the matching value. Answer: Longer Right, we all assumed longer. The amount of time could just be unboxing of object to get value or whatever. This answers the question for me though, because I'm going to be putting a bunch of items in a ConcurrentQueue and I want to add more items but don't want to add an item if the collection already contains the item.
void Main()
{ConcurrentQueue mainList = new ConcurrentQueue(); Console.WriteLine("Adding items..."); for (int x = 1; x
raddevus wrote:
I needed to know that when I went to determine if the queue already had the item, if it would return faster when the item was found at the top (next item out) versus found at the bottom (last out position). I assumed, but wasn't sure. This code gives a good definitive answer
You forgot the important qualifier through -- this code gives a good definitive answer for build XXX of the library. In the next build, the implementer may find a bug with the way they're doing things and have to change it, altering the performance. If you're writing code that depends that much on the performance of some class, you need to write the class yourself.
I live in Oregon, and I'm an engineer.