Skip to content
  • Categories
  • Recent
  • Tags
  • Popular
  • World
  • Users
  • Groups
Skins
  • Light
  • Cerulean
  • Cosmo
  • Flatly
  • Journal
  • Litera
  • Lumen
  • Lux
  • Materia
  • Minty
  • Morph
  • Pulse
  • Sandstone
  • Simplex
  • Sketchy
  • Spacelab
  • United
  • Yeti
  • Zephyr
  • Dark
  • Cyborg
  • Darkly
  • Quartz
  • Slate
  • Solar
  • Superhero
  • Vapor

  • Default (No Skin)
  • No Skin
Collapse
Code Project
  1. Home
  2. General Programming
  3. C / C++ / MFC
  4. How to get threads in a custom threadpool to not require locks when enabling/disabling threads?

How to get threads in a custom threadpool to not require locks when enabling/disabling threads?

Scheduled Pinned Locked Moved C / C++ / MFC
data-structureshelptutorialquestion
6 Posts 3 Posters 0 Views 1 Watching
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • C Offline
    C Offline
    Cyrilix
    wrote on last edited by
    #1

    The problem with my custom threadpool (fun activity, not commercial) is this: I have, say 5 threads running, a global counter of how many pieces of work there are, and a work event which should be unsignaled (reset) when there is no work, and signaled (set) when there is work, because when there is no work, the threads simply wait on this event. I have two methods PushWork() and PopWork(). PushWork() increments the global counter by 1 using an interlocked operation, and checks "if I just incremented the global counter to 1, SetEvent()". PopWork() decrements the global counter by 1 using an interlocked operation, and checks "if I just decremented the global counter to 0, ResetEvent() and wait on the event". The problem is a race condition. Imagine if the work queue is currently empty. PushWork() gets called, and increments the counter to 1. Then, PopWork() gets called and decrements the counter to 0. PopWork(), running faster than PushWork() (for whatever reason) resets the event. PushWork(), chugging along, then sets the event. Now, we have a no more work in the queue, but the event is set, so the threads keep polling the queue for work. There is a similar problem for the other way around. Imagine if the work queue currently has one item. PopWork() gets called, and decrements the counter to 0. Then, PushWork() gets called and increments the counter to 1. PushWork() sets the event. PopWork() resets the event. Now, there are work items in the queue, but the threads will stop running as soon as it hits the event. The reason why this can occur is because modifying the counter and setting/resetting the event are not atomic. I could simply wrap both of these with a critical section, but I think the overhead is too big, since it means that for every piece of work, there are two EnterCriticalSection() calls and two LeaveCriticalSection() calls, one for each of push and pop. I've been trying to find a way to do this without locks. Does anyone have any ideas? INFO: I use a combination of global work queues and local work queues (one per thread) as per modern threadpooling implementations with work-stealing queues. I do, however, have one global counter for all the work. I've entertained the idea of using different counters but don't see how that would solve my problem.

    S L 2 Replies Last reply
    0
    • C Cyrilix

      The problem with my custom threadpool (fun activity, not commercial) is this: I have, say 5 threads running, a global counter of how many pieces of work there are, and a work event which should be unsignaled (reset) when there is no work, and signaled (set) when there is work, because when there is no work, the threads simply wait on this event. I have two methods PushWork() and PopWork(). PushWork() increments the global counter by 1 using an interlocked operation, and checks "if I just incremented the global counter to 1, SetEvent()". PopWork() decrements the global counter by 1 using an interlocked operation, and checks "if I just decremented the global counter to 0, ResetEvent() and wait on the event". The problem is a race condition. Imagine if the work queue is currently empty. PushWork() gets called, and increments the counter to 1. Then, PopWork() gets called and decrements the counter to 0. PopWork(), running faster than PushWork() (for whatever reason) resets the event. PushWork(), chugging along, then sets the event. Now, we have a no more work in the queue, but the event is set, so the threads keep polling the queue for work. There is a similar problem for the other way around. Imagine if the work queue currently has one item. PopWork() gets called, and decrements the counter to 0. Then, PushWork() gets called and increments the counter to 1. PushWork() sets the event. PopWork() resets the event. Now, there are work items in the queue, but the threads will stop running as soon as it hits the event. The reason why this can occur is because modifying the counter and setting/resetting the event are not atomic. I could simply wrap both of these with a critical section, but I think the overhead is too big, since it means that for every piece of work, there are two EnterCriticalSection() calls and two LeaveCriticalSection() calls, one for each of push and pop. I've been trying to find a way to do this without locks. Does anyone have any ideas? INFO: I use a combination of global work queues and local work queues (one per thread) as per modern threadpooling implementations with work-stealing queues. I do, however, have one global counter for all the work. I've entertained the idea of using different counters but don't see how that would solve my problem.

      S Offline
      S Offline
      Stuart Dootson
      wrote on last edited by
      #2

      Cyrilix wrote:

      The reason why this can occur is because modifying the counter and setting/resetting the event are not atomic. I could simply wrap both of these with a critical section, but I think the overhead is too big, since it means that for every piece of work, there are two EnterCriticalSection() calls and two LeaveCriticalSection() calls, one for each of push and pop. I've been trying to find a way to do this without locks. Does anyone have any ideas?

      1. Lock-free programming is *very* difficult - and I don't think can be done for your specific case of multiple, unconnected actions. 2. If the overhead of calling critical section is significant compared to amount of work being done in the thread, your work items are much too small. You need to make your two actions atomic - the only way to do that is to serialize access to a combination of work queue and event, using a mutex or critical section. A different way might be for the thing that's pushing the work to select the thread that's to do the work and, by having an event per thread, signal the selected thread to pick up a work item. If there are no waiting threads, do nothing. And when a thread enters a waiting state, it can look at the work queue before entering the waiting state and pick the next work item.

      Java, Basic, who cares - it's all a bunch of tree-hugging hippy cr*p

      C 1 Reply Last reply
      0
      • S Stuart Dootson

        Cyrilix wrote:

        The reason why this can occur is because modifying the counter and setting/resetting the event are not atomic. I could simply wrap both of these with a critical section, but I think the overhead is too big, since it means that for every piece of work, there are two EnterCriticalSection() calls and two LeaveCriticalSection() calls, one for each of push and pop. I've been trying to find a way to do this without locks. Does anyone have any ideas?

        1. Lock-free programming is *very* difficult - and I don't think can be done for your specific case of multiple, unconnected actions. 2. If the overhead of calling critical section is significant compared to amount of work being done in the thread, your work items are much too small. You need to make your two actions atomic - the only way to do that is to serialize access to a combination of work queue and event, using a mutex or critical section. A different way might be for the thing that's pushing the work to select the thread that's to do the work and, by having an event per thread, signal the selected thread to pick up a work item. If there are no waiting threads, do nothing. And when a thread enters a waiting state, it can look at the work queue before entering the waiting state and pick the next work item.

        Java, Basic, who cares - it's all a bunch of tree-hugging hippy cr*p

        C Offline
        C Offline
        Cyrilix
        wrote on last edited by
        #3

        You need to make your two actions atomic - the only way to do that is to serialize access to a combination of work queue and event, using a mutex or critical section. I sort of figured that might be the case. If the overhead of calling critical section is significant compared to amount of work being done in the thread, your work items are much too small. What I use for work items is something in the order of a hundred multiplications/divisions. So, for instance, say I want to multiply two square matrices matrices or equal size using the naive method, I split up the work so that each entry of the resulting matrix is a single work item. If I have a 100x100 matrix multiplied by a 100x100 matrix, then that gives me 100 multiplications and 100 additions for each of 100x100 (10000) work items. This seems to really stress my threadpool implementation since the single-threaded version of this naive matrix multiplication is a lot faster, and it's definitely due to the overhead of my threadpool. I was thinking that a more efficient use of synchronization (if possible) might alleviate this problem, however, I haven't actually extensively profiled my program. What I do know, however, is that the vast majority of the time is being spent on queueing the work, rather than doing the actual work. A different way might be for the thing that's pushing the work to select the thread that's to do the work and, by having an event per thread, signal the selected thread to pick up a work item. If there are no waiting threads, do nothing. And when a thread enters a waiting state, it can look at the work queue before entering the waiting state and pick the next work item. OK, that's some food for thought. I'll think about this and see if I can implement something accordingly. Thanks for your comments.

        S 1 Reply Last reply
        0
        • C Cyrilix

          You need to make your two actions atomic - the only way to do that is to serialize access to a combination of work queue and event, using a mutex or critical section. I sort of figured that might be the case. If the overhead of calling critical section is significant compared to amount of work being done in the thread, your work items are much too small. What I use for work items is something in the order of a hundred multiplications/divisions. So, for instance, say I want to multiply two square matrices matrices or equal size using the naive method, I split up the work so that each entry of the resulting matrix is a single work item. If I have a 100x100 matrix multiplied by a 100x100 matrix, then that gives me 100 multiplications and 100 additions for each of 100x100 (10000) work items. This seems to really stress my threadpool implementation since the single-threaded version of this naive matrix multiplication is a lot faster, and it's definitely due to the overhead of my threadpool. I was thinking that a more efficient use of synchronization (if possible) might alleviate this problem, however, I haven't actually extensively profiled my program. What I do know, however, is that the vast majority of the time is being spent on queueing the work, rather than doing the actual work. A different way might be for the thing that's pushing the work to select the thread that's to do the work and, by having an event per thread, signal the selected thread to pick up a work item. If there are no waiting threads, do nothing. And when a thread enters a waiting state, it can look at the work queue before entering the waiting state and pick the next work item. OK, that's some food for thought. I'll think about this and see if I can implement something accordingly. Thanks for your comments.

          S Offline
          S Offline
          Stuart Dootson
          wrote on last edited by
          #4

          I was thinking with my suggested alternate solution that as you're using an event per thread, you could use a lock-free queue - there are plenty of implementations on the web if you Google for "lock free queue c++". It's a case of finding a trust-worthy one! This[^] may be useful, while this[^] may be of interest, as might this[^]. In fact, as all of those are by Herb Sutter, Herb Sutter's blog[^] is probably of interest!

          Java, Basic, who cares - it's all a bunch of tree-hugging hippy cr*p

          1 Reply Last reply
          0
          • C Cyrilix

            The problem with my custom threadpool (fun activity, not commercial) is this: I have, say 5 threads running, a global counter of how many pieces of work there are, and a work event which should be unsignaled (reset) when there is no work, and signaled (set) when there is work, because when there is no work, the threads simply wait on this event. I have two methods PushWork() and PopWork(). PushWork() increments the global counter by 1 using an interlocked operation, and checks "if I just incremented the global counter to 1, SetEvent()". PopWork() decrements the global counter by 1 using an interlocked operation, and checks "if I just decremented the global counter to 0, ResetEvent() and wait on the event". The problem is a race condition. Imagine if the work queue is currently empty. PushWork() gets called, and increments the counter to 1. Then, PopWork() gets called and decrements the counter to 0. PopWork(), running faster than PushWork() (for whatever reason) resets the event. PushWork(), chugging along, then sets the event. Now, we have a no more work in the queue, but the event is set, so the threads keep polling the queue for work. There is a similar problem for the other way around. Imagine if the work queue currently has one item. PopWork() gets called, and decrements the counter to 0. Then, PushWork() gets called and increments the counter to 1. PushWork() sets the event. PopWork() resets the event. Now, there are work items in the queue, but the threads will stop running as soon as it hits the event. The reason why this can occur is because modifying the counter and setting/resetting the event are not atomic. I could simply wrap both of these with a critical section, but I think the overhead is too big, since it means that for every piece of work, there are two EnterCriticalSection() calls and two LeaveCriticalSection() calls, one for each of push and pop. I've been trying to find a way to do this without locks. Does anyone have any ideas? INFO: I use a combination of global work queues and local work queues (one per thread) as per modern threadpooling implementations with work-stealing queues. I do, however, have one global counter for all the work. I've entertained the idea of using different counters but don't see how that would solve my problem.

            L Offline
            L Offline
            Lost User
            wrote on last edited by
            #5

            Cyrilix wrote:

            The reason why this can occur is because modifying the counter and setting/resetting the event are not atomic.

            Have you considered using Interlocked Singly Linked Lists[^] for your work que? This would make adding, removing and querying all atomic operations. Using Singly Linked Lists[^] Best Wishes, -David Delaune

            C 1 Reply Last reply
            0
            • L Lost User

              Cyrilix wrote:

              The reason why this can occur is because modifying the counter and setting/resetting the event are not atomic.

              Have you considered using Interlocked Singly Linked Lists[^] for your work que? This would make adding, removing and querying all atomic operations. Using Singly Linked Lists[^] Best Wishes, -David Delaune

              C Offline
              C Offline
              Cyrilix
              wrote on last edited by
              #6

              I've seen that series of documents, although I haven't considered it for my implementation, though I don't think it really solves the problem I have of trying to make "push/pop and signal" atomic. It does however seem like it would have better performance than the standard locked queue. Thanks for bringing it up.

              1 Reply Last reply
              0
              Reply
              • Reply as topic
              Log in to reply
              • Oldest to Newest
              • Newest to Oldest
              • Most Votes


              • Login

              • Don't have an account? Register

              • Login or register to search.
              • First post
                Last post
              0
              • Categories
              • Recent
              • Tags
              • Popular
              • World
              • Users
              • Groups