How to best pull together data from multiple sources (hardware sensors) in Win32?
-
Hi all, this is my first question in this forum. I'd like to begin it by saying hello to everyone. I am writing a Win32 application that uses multiple sensors, such as the Kinect V2, and a couple of other sensors that complement Kinect's capabilities. These other sensors are interfaced with the PC via the Serial Port. Both the Kinect and the other sensors produce data samples periodically at similar time intervals. The challenge I face is how to best pull all of these data samples together into my application. What I mean by that is how to best structure my program in order to get the data in effitiently. I think it's a Producer-Consumer type of a problem. The idea I have is to read the data samples from each of the external sources in a separate Reader Thread. Each of these theads would fire an event when a new sample has been received. The Main Thread would pick up these events using the WaitForMultipleObjects() function, as described [here]. This should provide enough synchronization. The Main Thread, I call it a Producer Thread, would copy all of the arriving data into a custom frame class. This frame would then be pushed into a FIFO queue. A Consumer Thread would pop and process these custom frames from the FIFO at full speed. Yes, I realise that this application is resource-hungry. The Consumer Thread pops frames at a lower rate than the frames are pushed onto the FIFO. The PC I develop it on should be able to handle it with with 16GB of RAM and the latest i7 CPU. Real-time operation is not required. Also, the program should complete before RAM fills up. I wonder if my approach and more importantly the thought process I have is correct? I wonder if there would be a better approach for this type of an application? I don't seem to see a better way. Thanks, MW
-
Hi all, this is my first question in this forum. I'd like to begin it by saying hello to everyone. I am writing a Win32 application that uses multiple sensors, such as the Kinect V2, and a couple of other sensors that complement Kinect's capabilities. These other sensors are interfaced with the PC via the Serial Port. Both the Kinect and the other sensors produce data samples periodically at similar time intervals. The challenge I face is how to best pull all of these data samples together into my application. What I mean by that is how to best structure my program in order to get the data in effitiently. I think it's a Producer-Consumer type of a problem. The idea I have is to read the data samples from each of the external sources in a separate Reader Thread. Each of these theads would fire an event when a new sample has been received. The Main Thread would pick up these events using the WaitForMultipleObjects() function, as described [here]. This should provide enough synchronization. The Main Thread, I call it a Producer Thread, would copy all of the arriving data into a custom frame class. This frame would then be pushed into a FIFO queue. A Consumer Thread would pop and process these custom frames from the FIFO at full speed. Yes, I realise that this application is resource-hungry. The Consumer Thread pops frames at a lower rate than the frames are pushed onto the FIFO. The PC I develop it on should be able to handle it with with 16GB of RAM and the latest i7 CPU. Real-time operation is not required. Also, the program should complete before RAM fills up. I wonder if my approach and more importantly the thought process I have is correct? I wonder if there would be a better approach for this type of an application? I don't seem to see a better way. Thanks, MW
Quote:
The Consumer Thread pops frames at a lower rate than the frames are pushed onto the FIFO
Why?
Quote:
Also, the program should complete before RAM fills up
It all depends on what the program should do with collected data, you know that if (as you stated) the consumer is slower than the producer then the memory will be eventually filled up.
-
Quote:
The Consumer Thread pops frames at a lower rate than the frames are pushed onto the FIFO
Why?
Quote:
Also, the program should complete before RAM fills up
It all depends on what the program should do with collected data, you know that if (as you stated) the consumer is slower than the producer then the memory will be eventually filled up.
The reason why I say that the consumer pops frames at a lower rate than the producer pushes them into the FIFO is that the consumer must process each frame. This operation is time consuming. In fact, the producer pushes approximately 3 frames into the FIFO during the time it takes the consumer to process 1 popped frame. I've written a couple of test programs to test some of the most time consuming tasks as well as how many frames would actually be needed to complete the task. This is why I made the above statements. That's not the main concern though. I am more curious about the general approach to designing the application. I've got the individual bits and pieces working. Next, these need to be put together, integrated. I wonder about how to best achieve it?
-
The reason why I say that the consumer pops frames at a lower rate than the producer pushes them into the FIFO is that the consumer must process each frame. This operation is time consuming. In fact, the producer pushes approximately 3 frames into the FIFO during the time it takes the consumer to process 1 popped frame. I've written a couple of test programs to test some of the most time consuming tasks as well as how many frames would actually be needed to complete the task. This is why I made the above statements. That's not the main concern though. I am more curious about the general approach to designing the application. I've got the individual bits and pieces working. Next, these need to be put together, integrated. I wonder about how to best achieve it?
-
It looks a fairly good design to me. If you cannot speed up the frame processing, then you might consider dropping some of them (if it is a viable option).
If he produces three for every one he processes, I'd imagine he's going to have to figure out a way how to process them faster or drops will have to occur.
-
Hi all, this is my first question in this forum. I'd like to begin it by saying hello to everyone. I am writing a Win32 application that uses multiple sensors, such as the Kinect V2, and a couple of other sensors that complement Kinect's capabilities. These other sensors are interfaced with the PC via the Serial Port. Both the Kinect and the other sensors produce data samples periodically at similar time intervals. The challenge I face is how to best pull all of these data samples together into my application. What I mean by that is how to best structure my program in order to get the data in effitiently. I think it's a Producer-Consumer type of a problem. The idea I have is to read the data samples from each of the external sources in a separate Reader Thread. Each of these theads would fire an event when a new sample has been received. The Main Thread would pick up these events using the WaitForMultipleObjects() function, as described [here]. This should provide enough synchronization. The Main Thread, I call it a Producer Thread, would copy all of the arriving data into a custom frame class. This frame would then be pushed into a FIFO queue. A Consumer Thread would pop and process these custom frames from the FIFO at full speed. Yes, I realise that this application is resource-hungry. The Consumer Thread pops frames at a lower rate than the frames are pushed onto the FIFO. The PC I develop it on should be able to handle it with with 16GB of RAM and the latest i7 CPU. Real-time operation is not required. Also, the program should complete before RAM fills up. I wonder if my approach and more importantly the thought process I have is correct? I wonder if there would be a better approach for this type of an application? I don't seem to see a better way. Thanks, MW
Only thing that isn't clear is whether you need a "reader thread", implying you'll read all sensors in series one after the other. This can be optimized if the read operations can be done in parallel. It's really application specific so I couldn't tell you if you can or can't do that. On the processing side, if each frame is identical and processes similarly, that can processed using a thread pool scheme, where you have a set of worker threads that process data when available. That works really well in cases where the processing required on data is identical (i.e. the work function is the same, but you can have multiple independent threads working in parallel on independent data). Again, parallel processing here is application dependent.
-
If he produces three for every one he processes, I'd imagine he's going to have to figure out a way how to process them faster or drops will have to occur.
-
Assuming it's not a process that goes on forever. Usually with sensors, it's an ongoing process, they're always producing data, so if you're not using it all you have to do some sort of smart data reduction (i.e. drop if it makes sense, decimate if it makes sense).
-
Only thing that isn't clear is whether you need a "reader thread", implying you'll read all sensors in series one after the other. This can be optimized if the read operations can be done in parallel. It's really application specific so I couldn't tell you if you can or can't do that. On the processing side, if each frame is identical and processes similarly, that can processed using a thread pool scheme, where you have a set of worker threads that process data when available. That works really well in cases where the processing required on data is identical (i.e. the work function is the same, but you can have multiple independent threads working in parallel on independent data). Again, parallel processing here is application dependent.
The idea with thread pooling sounds very interesting. I'm not sure how applicable it is though. I should have provided a little bit more specific information. Apologies. The other sensors communicate with the PC via Bluetooth asynchronously. Each of them sends a couple-of-bytes-long data packet. All work at rougly the same speed. The pakets arrive in random order. It's not a problem as long as all the most recent packets are received. In terms of the Kinect, I use almost all streams except for sound and color. The idea is that once all of the samples have arrived, including the multi-source frame from Kinect, their respective readers would fire an event. Once the WaitForMultipleObjects() function sees that all the expected events have fired, it unblocks and the data is copied into a custom frame class before being pushed onto the FIFO. On the consumer side, the things look a little bit more interesting. I can't afford to drop any frames from the Kinect. One of the heaviest tasks that need to be carried out is to run the Kinect Fusion algorithm. It runs best on the GPU. I am not sure if this task can be parallelized on a standard PC. Fusion runs way slowlier on the CPU. Maybe it would be possible to run two instances of the Fusion, one on GPU and the other on CPU, but I don't know how much sense it would make. Obviously, one of the bottlenecks is the throughput of the given GPU. I'm trying to develop this program in such a way that its performance would vary depending on PCs specifications, in particular the GPU and RAM. Poorer machines would process slowly whereas better ones would give up to real-time performance. Some of the top gaming PCs can run Fusion at Kinect's fps. From what I can see, the consumer side would seem to work out best to as a straight serial operation. Basically, it would be something like: 1. Pop frame from FIFO 2. Preprocess it (include other not time-consuming processing) 3. Pass the frame to Fusion. 4. Loop back to 1. if not complete. I hope it's a bit clearer now what kind of an application it is and what sort of requirements it would have. I am not an experienced professional coder :). I just use common sense. The best structure to this program that I was able to come up with was the one described in previous posts. I don't seem to see a better way of structuring it. I greatly appreciate your input guys. I look forward to seeing more opinions, suggestions etc. Thanks, MW
-
The idea with thread pooling sounds very interesting. I'm not sure how applicable it is though. I should have provided a little bit more specific information. Apologies. The other sensors communicate with the PC via Bluetooth asynchronously. Each of them sends a couple-of-bytes-long data packet. All work at rougly the same speed. The pakets arrive in random order. It's not a problem as long as all the most recent packets are received. In terms of the Kinect, I use almost all streams except for sound and color. The idea is that once all of the samples have arrived, including the multi-source frame from Kinect, their respective readers would fire an event. Once the WaitForMultipleObjects() function sees that all the expected events have fired, it unblocks and the data is copied into a custom frame class before being pushed onto the FIFO. On the consumer side, the things look a little bit more interesting. I can't afford to drop any frames from the Kinect. One of the heaviest tasks that need to be carried out is to run the Kinect Fusion algorithm. It runs best on the GPU. I am not sure if this task can be parallelized on a standard PC. Fusion runs way slowlier on the CPU. Maybe it would be possible to run two instances of the Fusion, one on GPU and the other on CPU, but I don't know how much sense it would make. Obviously, one of the bottlenecks is the throughput of the given GPU. I'm trying to develop this program in such a way that its performance would vary depending on PCs specifications, in particular the GPU and RAM. Poorer machines would process slowly whereas better ones would give up to real-time performance. Some of the top gaming PCs can run Fusion at Kinect's fps. From what I can see, the consumer side would seem to work out best to as a straight serial operation. Basically, it would be something like: 1. Pop frame from FIFO 2. Preprocess it (include other not time-consuming processing) 3. Pass the frame to Fusion. 4. Loop back to 1. if not complete. I hope it's a bit clearer now what kind of an application it is and what sort of requirements it would have. I am not an experienced professional coder :). I just use common sense. The best structure to this program that I was able to come up with was the one described in previous posts. I don't seem to see a better way of structuring it. I greatly appreciate your input guys. I look forward to seeing more opinions, suggestions etc. Thanks, MW
Member 11703498 wrote:
I can't afford to drop any frames from the Kinect
I don't know how you can say this and also say that you're producing data faster than you can process it. You HAVE to decimate if you're not keeping up. You'll be dropping data even if you don't want to once your queue is full. Best to deal with that some way or another so the results are predictable.
Member 11703498 wrote:
It runs best on the GPU. I am not sure if this task can be parallelized on a standard PC. Fusion runs way slowlier on the CPU. Maybe it would be possible to run two instances of the Fusion, one on GPU and the other on CPU, but I don't know how much sense it would make.
GPU's are powerful because they can run real-time and use parallelism well, make sure you're taking advantage of that.
-
Member 11703498 wrote:
I can't afford to drop any frames from the Kinect
I don't know how you can say this and also say that you're producing data faster than you can process it. You HAVE to decimate if you're not keeping up. You'll be dropping data even if you don't want to once your queue is full. Best to deal with that some way or another so the results are predictable.
Member 11703498 wrote:
It runs best on the GPU. I am not sure if this task can be parallelized on a standard PC. Fusion runs way slowlier on the CPU. Maybe it would be possible to run two instances of the Fusion, one on GPU and the other on CPU, but I don't know how much sense it would make.
GPU's are powerful because they can run real-time and use parallelism well, make sure you're taking advantage of that.
Albert Holguin wrote:
Member 11703498 wrote:
I can't afford to drop any frames from the Kinect
I don't know how you can say this and also say that you're producing data faster than you can process it. You HAVE to decimate if you're not keeping up. You'll be dropping data even if you don't want to once your queue is full. Best to deal with that some way or another so the results are predictable.
Not necessarily. Dropping frames is not a good idea when they are used by the Kinect Fusion algoritm. This algorithm simply fails if the consecutive frames supply data that differs too much from frame to frame. This typically happens when frames are dropped or the Color stream causes Kinect's frame rate to drop to approx 30fps/2 (it's just a feature, or downside one should say, of this particular sensor). The queue won't fill up. The system is required to have enough RAM available to the program. Also, the task should complete with a certain amount of frames stored in the queue. The nature of the application is to do a one off job, not a continuous one heavy lifting. If the task fails for whatever reason, say the amount of acquired frames was insufficient, or something along those lines, then the task can be repeated. I am not that proficient at parallelizing stuff on GPUs :). Fom what I've seen, the Fusion algorithm utilizes the GPUs resources to the max. A good GPU would actually make the Queue and high RAM requirement redundant. For the time being though, the best approach I can see is to stick to the use of the Queue and large RAM.