sorting information from several threads - is my algorithm correct?
-
John Torjo wrote:
b. If there is one or more threads that doesn't have at least one element, return an empty list
You probably want: If there are no threads that have at least one element, return an empty list It is empty only if there is no data, otherwise the concept of "latest" implies that the returned list isn't empty.
"Fairy tales do not tell children the dragons exist. Children already know that dragons exist. Fairy tales tell children the dragons can be killed." - G.K. Chesterton
Yeah, I basically explained the algorithm wrong. So long story short - consider all I said "WHILE THERE ARE IS MORE DATA TO COME ON ALL THREADS". When for a thread, everything, has been read, that is a different scenario, which I handle correctly. Best, John
-- LogWizard Meet the Log Viewer that makes monitoring log files a joy!
-
Yeah, I basically explained the algorithm wrong. So long story short - consider all I said "WHILE THERE ARE IS MORE DATA TO COME ON ALL THREADS". When for a thread, everything, has been read, that is a different scenario, which I handle correctly. Best, John
-- LogWizard Meet the Log Viewer that makes monitoring log files a joy!
What happens if after responding to a request for all of the "latest", one of the threads acquires/reports an event that is time stamped earlier than the "latest" already reported (which could have come from a different thread)?
"Fairy tales do not tell children the dragons exist. Children already know that dragons exist. Fairy tales tell children the dragons can be killed." - G.K. Chesterton
-
What happens if after responding to a request for all of the "latest", one of the threads acquires/reports an event that is time stamped earlier than the "latest" already reported (which could have come from a different thread)?
"Fairy tales do not tell children the dragons exist. Children already know that dragons exist. Fairy tales tell children the dragons can be killed." - G.K. Chesterton
As long as there are more items to come, I'm always keeping at least the last element (which I don't return as "lastest") - see step d. Best, John
-- LogWizard Meet the Log Viewer that makes monitoring log files a joy!
-
Hi guys, I want to sort in real time (by datetime) information coming from several threads. There is a thread the constantly polls for latest items. The polling thread appends whatever I return to his existing list of items. I need to returns items in such a way that polling thread's list is sorted at all times. Here are my requirements: 1. I have several threads, each reading a different event log. 2. Each thread reads all items (events) from this log, and pushes them in a list (specific to that thread) 3. There is a client asking periodically for the "latest" events (sorted by datetime). Once I put an item into this latest events list, it will disappear from my internal list. 4. The client appends this list to his existing items, and the new resulting list has to be sorted (the client starts with an empty list). My current algorithm is: Each time the client asks for "latest" events: a. Look at all thread's lists b. If there is one or more threads that doesn't have at least one element, return an empty list c. Look through all lists, and take the "earliest" element. d. If the list contains the "earliest" element, contains at least one more element, add this to the "latest" events and go back to step a. e. Otherwise, return existing list My take and current tests say that the algorithm is correct. Can you find a counterexample? Is my algorithm wrong? Best, John
-- LogWizard Meet the Log Viewer that makes monitoring log files a joy!
While your "algorithm" may be perfect for what you want, reading it, it does not make "complete sense" to me. 1. assuming you are notifying the user with the "latest" event: 1.a. wouldn't you be selecting from each thread either some fixed maximum number of events to report, or some events filtered by a cut-off time: like, "in the last six hours" ? or ... see 1.b. below: 1.b. are you maintaining a pointer of some type so you do not report to the user events which you have already reported ?
"c. Look through all lists, and take the "earliest" element."
I get confused on this one: aren't you, at all times, dealing with the latest events. Isn't it possible that the information that there are no recent events in one thread/log would be meaningful to the user ?
«I want to stay as close to the edge as I can without going over. Out on the edge you see all kinds of things you can't see from the center» Kurt Vonnegut.
-
While your "algorithm" may be perfect for what you want, reading it, it does not make "complete sense" to me. 1. assuming you are notifying the user with the "latest" event: 1.a. wouldn't you be selecting from each thread either some fixed maximum number of events to report, or some events filtered by a cut-off time: like, "in the last six hours" ? or ... see 1.b. below: 1.b. are you maintaining a pointer of some type so you do not report to the user events which you have already reported ?
"c. Look through all lists, and take the "earliest" element."
I get confused on this one: aren't you, at all times, dealing with the latest events. Isn't it possible that the information that there are no recent events in one thread/log would be meaningful to the user ?
«I want to stay as close to the edge as I can without going over. Out on the edge you see all kinds of things you can't see from the center» Kurt Vonnegut.
It is a weird problem for sure :) 1a. I don't know that - I can only read what is coming. And I also don't know the timing it takes for each log entry to arrive. It's up to the OS. 1b. I go for the easy way - once a user event is reported, I remove it from the list
BillWoodruff wrote:
I get confused on this one: aren't you, at all times, dealing with the latest events.
Each thread reads as much as it can - continuously. The main thread (on a refresh timer of lets say, 100ms), constantly queries for "latest" entries. At which point I will go through the lists of each thread, and see what I can return. Hope this makes a bit of sense :) Best, John
-- LogWizard Meet the Log Viewer that makes monitoring log files a joy!
-
Hi guys, I want to sort in real time (by datetime) information coming from several threads. There is a thread the constantly polls for latest items. The polling thread appends whatever I return to his existing list of items. I need to returns items in such a way that polling thread's list is sorted at all times. Here are my requirements: 1. I have several threads, each reading a different event log. 2. Each thread reads all items (events) from this log, and pushes them in a list (specific to that thread) 3. There is a client asking periodically for the "latest" events (sorted by datetime). Once I put an item into this latest events list, it will disappear from my internal list. 4. The client appends this list to his existing items, and the new resulting list has to be sorted (the client starts with an empty list). My current algorithm is: Each time the client asks for "latest" events: a. Look at all thread's lists b. If there is one or more threads that doesn't have at least one element, return an empty list c. Look through all lists, and take the "earliest" element. d. If the list contains the "earliest" element, contains at least one more element, add this to the "latest" events and go back to step a. e. Otherwise, return existing list My take and current tests say that the algorithm is correct. Can you find a counterexample? Is my algorithm wrong? Best, John
-- LogWizard Meet the Log Viewer that makes monitoring log files a joy!
-
Do you have any control over the "log readers"? In which case, can they not "push" notifications and new events to the client (via say, a ConcurrentDictionary and an event)?
Depends what you mean by control :) Not sure if having a Concurrent dictionary would help. The problem is that what I return as "latest" should be sorted correctly at all times. The client should not have to re-sort his existing list after each call to get the "latest" - that would defeat its purpose. To give you an example with numbers (instead of dates), and 3 threads:
A: 1,7,20
B: 2,3,15
C: 4,8At this point, I can safely return
{1,2,3,4,7}
, and we'll haveA: 20
B: 15
C: 8At this point, say I get two more numbers:
A: 20,25
B: 15,17
C: 8Here, I can't return anything, because I'm not sure what will come after 8. Say I wait a bit and then have:
A: 20,25
B: 15,17
C: 8,16,21Right now I can return
{8,15,16}
, and will still haveA: 20,26
B: 17
C: 21(note: I can't return 20, since I'm not sure what will come next on B) I wait a bit and have
A: 20,26
B: 17,18
C: 21,23Right now I can only return
{17}
. I can't return nor 20, nor 21, because I'm not sure what will come next on B. Best, John-- LogWizard Meet the Log Viewer that makes monitoring log files a joy!
-
It is a weird problem for sure :) 1a. I don't know that - I can only read what is coming. And I also don't know the timing it takes for each log entry to arrive. It's up to the OS. 1b. I go for the easy way - once a user event is reported, I remove it from the list
BillWoodruff wrote:
I get confused on this one: aren't you, at all times, dealing with the latest events.
Each thread reads as much as it can - continuously. The main thread (on a refresh timer of lets say, 100ms), constantly queries for "latest" entries. At which point I will go through the lists of each thread, and see what I can return. Hope this makes a bit of sense :) Best, John
-- LogWizard Meet the Log Viewer that makes monitoring log files a joy!
This is an "aside," since I don't really grok what the constraints here are, but are you using Stacks here to get LIFO order ?
«I want to stay as close to the edge as I can without going over. Out on the edge you see all kinds of things you can't see from the center» Kurt Vonnegut.
-
This is an "aside," since I don't really grok what the constraints here are, but are you using Stacks here to get LIFO order ?
«I want to stay as close to the edge as I can without going over. Out on the edge you see all kinds of things you can't see from the center» Kurt Vonnegut.
Nope. Simple lists - each list (on each thread) is sorted in itself. EDIT: see my answer to Gerry for an example of what I need. Best, John
-- LogWizard Meet the Log Viewer that makes monitoring log files a joy!
-
Depends what you mean by control :) Not sure if having a Concurrent dictionary would help. The problem is that what I return as "latest" should be sorted correctly at all times. The client should not have to re-sort his existing list after each call to get the "latest" - that would defeat its purpose. To give you an example with numbers (instead of dates), and 3 threads:
A: 1,7,20
B: 2,3,15
C: 4,8At this point, I can safely return
{1,2,3,4,7}
, and we'll haveA: 20
B: 15
C: 8At this point, say I get two more numbers:
A: 20,25
B: 15,17
C: 8Here, I can't return anything, because I'm not sure what will come after 8. Say I wait a bit and then have:
A: 20,25
B: 15,17
C: 8,16,21Right now I can return
{8,15,16}
, and will still haveA: 20,26
B: 17
C: 21(note: I can't return 20, since I'm not sure what will come next on B) I wait a bit and have
A: 20,26
B: 17,18
C: 21,23Right now I can only return
{17}
. I can't return nor 20, nor 21, because I'm not sure what will come next on B. Best, John-- LogWizard Meet the Log Viewer that makes monitoring log files a joy!
You can get more numbers down the pipe sooner by pushing. By “control”, I meant change the code of the providers. Yes; don’t need a dictionary; just a concurrent queue on the client side and observable collections on the providers: If the client can see the providers; and all the providers can see the client; and all the providers can see the other providers, then it is possible for all the providers to “push” their numbers, in order, to the client; or the client can be notified any time a new number arrives and can determine whether to “pull” the numbers that were added to the providers’ observable collections. No redundant polling. In the case of “pushing”: A pushes 1; B pushes 2,3; C 4; A 7; C 8; B 15; C 16; etc. (since everyone sees everyone else). Cheers
-
You can get more numbers down the pipe sooner by pushing. By “control”, I meant change the code of the providers. Yes; don’t need a dictionary; just a concurrent queue on the client side and observable collections on the providers: If the client can see the providers; and all the providers can see the client; and all the providers can see the other providers, then it is possible for all the providers to “push” their numbers, in order, to the client; or the client can be notified any time a new number arrives and can determine whether to “pull” the numbers that were added to the providers’ observable collections. No redundant polling. In the case of “pushing”: A pushes 1; B pushes 2,3; C 4; A 7; C 8; B 15; C 16; etc. (since everyone sees everyone else). Cheers
Not sure I follow. 1. The client does not know the providers. 2. The providers - lets say they can know about each other. The thing is - it's not a matter of push or poll, because the algorithm should still be the same, right? EDIT: In other words, how does a provider know when to "push" a number? Best, John
-- LogWizard Meet the Log Viewer that makes monitoring log files a joy!
-
Not sure I follow. 1. The client does not know the providers. 2. The providers - lets say they can know about each other. The thing is - it's not a matter of push or poll, because the algorithm should still be the same, right? EDIT: In other words, how does a provider know when to "push" a number? Best, John
-- LogWizard Meet the Log Viewer that makes monitoring log files a joy!
I think there are any number of ways; all "correct"; but not the "same". If all the providers "know" what the other is doing, then they have all the info they need to push: they know what's behind them and beside them. In the sample you posted, there was no "stalling". One can "subscribe" to events posted by observable collections. If each provider subscribes to each other provider, everyone will always know when anyone adds or removes entries from their collections; this is their cue to decide whether to push (much in the same way the client decides whether to pull from a single queue and how much). I just don't like polling if I don't have to (as when there is no data). Just more "staying alive".
-
I think there are any number of ways; all "correct"; but not the "same". If all the providers "know" what the other is doing, then they have all the info they need to push: they know what's behind them and beside them. In the sample you posted, there was no "stalling". One can "subscribe" to events posted by observable collections. If each provider subscribes to each other provider, everyone will always know when anyone adds or removes entries from their collections; this is their cue to decide whether to push (much in the same way the client decides whether to pull from a single queue and how much). I just don't like polling if I don't have to (as when there is no data). Just more "staying alive".
Gerry Schmitz wrote:
I just don't like polling if I don't have to (as when there is no data). Just more "staying alive".
I get your point. But in my case, it's pretty much a requirement :) Sorry, should have mentioned that upfront. Best, John
-- LogWizard Meet the Log Viewer that makes monitoring log files a joy!
-
Hi guys, I want to sort in real time (by datetime) information coming from several threads. There is a thread the constantly polls for latest items. The polling thread appends whatever I return to his existing list of items. I need to returns items in such a way that polling thread's list is sorted at all times. Here are my requirements: 1. I have several threads, each reading a different event log. 2. Each thread reads all items (events) from this log, and pushes them in a list (specific to that thread) 3. There is a client asking periodically for the "latest" events (sorted by datetime). Once I put an item into this latest events list, it will disappear from my internal list. 4. The client appends this list to his existing items, and the new resulting list has to be sorted (the client starts with an empty list). My current algorithm is: Each time the client asks for "latest" events: a. Look at all thread's lists b. If there is one or more threads that doesn't have at least one element, return an empty list c. Look through all lists, and take the "earliest" element. d. If the list contains the "earliest" element, contains at least one more element, add this to the "latest" events and go back to step a. e. Otherwise, return existing list My take and current tests say that the algorithm is correct. Can you find a counterexample? Is my algorithm wrong? Best, John
-- LogWizard Meet the Log Viewer that makes monitoring log files a joy!
What's wrong with using a priority queue, sorted by the event's timestamp? All "writer" threads write their events to the queue, and the reader thread just has to read the events in order. Note: 1. Both reading and writing from the Priority Queue require synchronization. 2. An insert/remove from a priority queue is more expensive than insert/remove from a regular queue - O(log(n)) as opposed to O(n). If your application is not hard real-time, this might be the way to go.
If you have an important point to make, don't try to be subtle or clever. Use a pile driver. Hit the point once. Then come back and hit it again. Then hit it a third time - a tremendous whack. --Winston Churchill
-
What's wrong with using a priority queue, sorted by the event's timestamp? All "writer" threads write their events to the queue, and the reader thread just has to read the events in order. Note: 1. Both reading and writing from the Priority Queue require synchronization. 2. An insert/remove from a priority queue is more expensive than insert/remove from a regular queue - O(log(n)) as opposed to O(n). If your application is not hard real-time, this might be the way to go.
If you have an important point to make, don't try to be subtle or clever. Use a pile driver. Hit the point once. Then come back and hit it again. Then hit it a third time - a tremendous whack. --Winston Churchill
The idea is that I can't return ALL items - assuming I'd use a priority queue - because I could end up receiving an item that's "earlier" than all existing ones. Please see my answer here: http://www.codeproject.com/Messages/5162354/Re-sorting-information-from-several-threads-is-my.aspx[^] LATER EDIT: My initial post wasn't clear on what I want. I'll try to fix that. Best, John
-- LogWizard Meet the Log Viewer that makes monitoring log files a joy!
-
This is an "aside," since I don't really grok what the constraints here are, but are you using Stacks here to get LIFO order ?
«I want to stay as close to the edge as I can without going over. Out on the edge you see all kinds of things you can't see from the center» Kurt Vonnegut.
I owe you guys an apology. I wasn't clear enough with my initial requirements. Long story short, I edited the initial post - hope it makes more sense now. It is a bit difficult to put the problem into words :D Best, John
-- LogWizard Meet the Log Viewer that makes monitoring log files a joy!