One genneral question on dual buffers design with multi-threading

SAMZC

Intend to capture data from network with huge volume. I intend to use multi-treading tech to do it. 1. producer:capture thread for data capturing. capture data in one loop and put incoming data into buffer 2. consumer:worker thread.get data from the producer and do more data processing. here the producer shares buffers with consumers.I want to use 2 buffers to get a high performance. and the two buffers works in ping-pang mode like this ... // the two buffers for sharing data BUFFER bufferA; BUFFER bufferB; one scenario is ...producer put data into bufferA and the consumer read data from bufferB. the producer use the buffer in procedure ... write data... bufferA---bufferB----bufferA----bufferB.... in ping-pang mode. the consumer use the buffer in procedure.... read data... bufferB---bufferA---bufferB---bufferA.....in ping-pang mode. of course, the synchronizition is the biggest problem here. That's my question too. How to implement the synchronization by using CRITICAL_SECTION/EVENT or other kernal objects to get a highest performance? Appreciated for you any input here ... Sam/BR.

The world is fine.

CPallini

I don't see any benefit in using two buffers instead of one. As about synchronization, see "Synchronization Objects" at MSDN (I would use a mutex). :)

If the Lord God Almighty had consulted me before embarking upon the Creation, I would have recommended something simpler. -- Alfonso the Wise, 13th Century King of Castile.
This is going on my arrogant assumptions. You may have a superb reason why I'm completely wrong. -- Iain Clarke
[My articles]

Cedric Moonen

I don't really get why you want to use two buffers for that. The two threads will work at different "speeds" so you won't have any control about which buffer is read or written. Why don't you simply use one buffer here ? The producer puts the data in one of the buffer (entering a critical section) and then signal an event that data is available. The consumer waits on the event to be signaled and then read data from the buffer (entering the critical section).

Cédric Moonen Software developer
Charting control [v3.0] OpenGL game tutorial in C++

Rajesh R Subramanian

CPallini wrote:

I would use a mutex

Of all the things why a mutex? It's heavier than a critical section, and we're not even operating across process boundaries here!

There are some really weird people on this planet - MIM.

Rajesh R Subramanian

Samuel Zhao wrote:

How to implement the synchronization by using CRITICAL_SECTION/EVENT or other kernal objects to get a highest performance?

You have a bigger problem than choosing between the available synchronisation mechanisms. As I see it, you're using a synchronous socket and are trying to use threads to manipulate the data that comes in. :omg: Please read up on sockets first, try and understand what is asynchronous socket and how it works. It may be of great help to you. Here's something to keep things in perspective: http://www.flounder.com/kb192570.htm[^] The link uses MFC (shut up, Carlo), but it throws some light on the subject. Also, here's some very good stuff: http://beej.us/guide/bgnet/[^] (Thanks to that bloke named Moak).

There are some really weird people on this planet - MIM.

federico strati

You're speking of a technique called flip-flop double buffering: while one thread writes data to one buffer, the other thread reads from the second buffer, then you swap buffers and start again. This is in fact a circular buffer with only two entries. I attach here two classes, one for the double buffer, the other for the n-buffer circular ring buffer, both of them may be used to solve the communication problem between the two threads. Note that the circular buffer works with pointers to buffers that you allocate and free outside the buffer itself, while the double buffer works with copying data from outside buffers to buffers preallocate inside it. You may also wish to get a read of the article: Lock-Free Single-Producer - Single Consumer Circular Queue for the circular ring buffer --- double buffer include ---

#ifndef DOUBLE_BUFFER_H
#define DOUBLE_BUFFER_H

#include <afx.h>
#include <afxwin.h>

class CDoubleBuffer
{
public:
CDoubleBuffer( unsigned int unAlloc = 0x000FFFFF );
~CDoubleBuffer(void);

void Write(void* pBuf, unsigned int unBytesTo);
void Read (void* pBuf, unsigned int unBytesFrom);

private:
void** pAlloc;
unsigned int unRead;
unsigned int unWrite;
unsigned int unSize;
};

#endif // ! defined (DOUBLE_BUFFER_H)

--- double buffer include end --- --- double buffer body ---

#include "stdafx.h"
#include "DoubleBuffer.h"

CDoubleBuffer::CDoubleBuffer( unsigned int unAlloc /* = 0x000FFFFF */ )
{
pAlloc = NULL;
pAlloc = (void **) ::HeapAlloc(::GetProcessHeap(), 0, 2 * sizeof(void*));
if (!pAlloc) throw;
pAlloc[0] = NULL;
pAlloc[0] = (void *) ::HeapAlloc(::GetProcessHeap(), 0, unAlloc * sizeof(BYTE));
if (!pAlloc[0]) throw;
pAlloc[1] = NULL;
pAlloc[1] = (void *) ::HeapAlloc(::GetProcessHeap(), 0, unAlloc * sizeof(BYTE));
if (!pAlloc[1]) throw;
::FillMemory((void*) pAlloc[0], unAlloc, 0);
::FillMemory((void*) pAlloc[1], unAlloc, 0);

unRead = 0;
unWrite = 0;
unSize = unAlloc;
}

CDoubleBuffer::~CDoubleBuffer(void)
{
::HeapFree(::GetProcessHeap(), 0, pAlloc[0]);
::HeapFree(::GetProcessHeap(), 0, pAlloc[1]);
::HeapFree(::GetProcessHeap(), 0, pAlloc);
}

void CDoubleBuffer::Write(void* pBuf, unsigned int unBytesTo)
{
unsigned int unTryWrite = (unWrite++)%2;
while( unRead == unTryWrite ) { ::Sleep(10); }
::MoveMemory( pAlloc[unTryWrite], pBuf, __min

CPallini

Because, for instance

Event, mutex, and semaphore objects can also be used in a single-process application, but critical section objects provide a slightly faster, more efficient mechanism for mutual-exclusion synchronization

doens't look to me a good counterpart to the scaring

Starting with Windows Server 2003 with Service Pack 1 (SP1), threads waiting on a critical section do not acquire the critical section on a first-come, first-serve basis. This change increases performance significantly for most code. However, some applications depend on first-in, first-out (FIFO) ordering and may perform poorly or not at all on current versions of Windows

("Critical Section Objects" at MSDN). :)

If the Lord God Almighty had consulted me before embarking upon the Creation, I would have recommended something simpler. -- Alfonso the Wise, 13th Century King of Castile.
This is going on my arrogant assumptions. You may have a superb reason why I'm completely wrong. -- Iain Clarke
[My articles]

SAMZC

Thanks. I am using winpcap to capture raw data from network, but not using winsocket to do that. Usually, I used one buffer to do it whit WinPCAP driver to capture data from ethernet switch. But the perforance is not satisfied. With one buffer, the capture thread put data into the buffer after data arrived. And signal the consumer to read it from the buffer. Here I suspect the wait consumption will decrease the performance.So I intend to introduce another buffer to increase the performance. The capture thread and the processing thread can work paralelly in some method. With one buffer ...Capture thread: enter critical section put data into buffer[] signal the event to tell processing thread. leave critical section And the processing thread: wait for single object and get signal... enter critical section read data into its buffer to release the shared buffer leave critical section I am thinking if I use 2 buffers(bufferA + bufferB). The capture thread and the procesing thread may work parallely. In worst case, one of them must wait one of the buffers release. But in normal busy case, each thread can access one of the buffer simutannously. Like this...The capture thread want to put data into the buffer. It begins to use the buffer from bufferA. And then use bufferB. So it will use the buffers as a sequence A...B...A...B...A...B...If data incomes, it only need to check the target buffer status.If available, put data into it and signal the processing thread. From the processing thread aspect, it will use the buffers as a sequence B...A...B....A...B...A...If get signal and it indicates data incoming to process. Then it will access target buffer to read data and release the signal to indicate the buffer is empty now and tell the capture thread its availability. Ok. It's just my thought on this question. No practice till now. I want to get some expirenced advice and confirm the solution is ok before I start. :^) Anyway, thanks for any of your help here. Sam/BR

The world is fine.

SAMZC

Ohh...Great.Thanks to Federico. It really what I want to say... I am not sure whether it is good soluiton with double buffers working in circular way in my case. ---------------------------------------------------- >>You're speking of a technique called flip-flop double buffering: >>while one thread writes data to one buffer, the other thread >>reads from the second buffer, then you swap buffers and start again. >>This is in fact a circular buffer with only two entries. ---------------------------------------------------- Thanks, Federico. I am studying your posted codes here.

The world is fine.

SAMZC

Hi Rajesh, Thanks for your reply. I am using winpcap to capture raw data from network, not using winsock to do it. Anyway, thanks for your input.

The world is fine.

Rajesh R Subramanian

Man! You're in a mood to argue. :)

CPallini wrote:

slightly faste

Only "slightly" faster, but put it in the context of execution - say thousands of times each seconds. That would make difference, especially in tight situations.

CPallini wrote:

However, some applications depend on first-in, first-out (FIFO) ordering and may perform poorly or not at all on current versions of Windows

But then we're talking about reading data from sockets here - they aren't dependent on FIFO ordering. In fact, if an application is dependent on FIFO ordered execution, then that shouldn't worry far too much about performance. Because there's always a chance of having a 'pig' thread in between which would be slow, thereby rendering the whole process slow. :)

There are some really weird people on this planet - MIM.

SAMZCN

I read the CDoubleBuffer class you write. It's a very good implementation on the double buffer usage. Here I have some questions on the buffer class. Would you please give some comments?Thanks in advance. 1. Is it thread-safe of CDoubleBuffer? In Read/Write function, each of them will access the target buffer.If in multi-thread case, the reader thread will use Read() and the writer thread will use Write() simutaneously. Both of the function will access unRead/unWrite variables. Is it necessary to declare the variable with key word of volatile? 2. About the CCircBuffer, I have the same question of CDoubleBuffer. Besides, what's the purpose of the following two functions?

// Increment the read pointer
UINT32 IncTheReadPtr(void);
// Increment the write pointer
UINT32 IncTheWritePtr(void);

From the implemention of the two funcitons, I see it does nothing except only increasing the pointer. And the pointer increasement has been done in GetTheData()/SetTheData() functions. Appreciated for your share. It's very usefull for me to understand the problem of my case furthermore. And I get big confidence to start my job on this case.

federico.strati wrote:

#ifndef DOUBLE_BUFFER_H #define DOUBLE_BUFFER_H #include #include class CDoubleBuffer { public: CDoubleBuffer( unsigned int unAlloc = 0x000FFFFF ); ~CDoubleBuffer(void); void Write(void* pBuf, unsigned int unBytesTo); void Read (void* pBuf, unsigned int unBytesFrom); private: void** pAlloc; unsigned int unRead; unsigned int unWrite; unsigned int unSize; }; #endif // ! defined (DOUBLE_BUFFER_H)

SAMZC

Normally, I prefer CRITICAL_SECTION to do the sychronization. It's faster and easy to use. Exactly as Rajesh said, if no FIFO orderging requirement, critical section is a good choice internal process.

The world is fine.

SAMZC

Two changes on the code to make an ATOM operation.And seems that the following declaration should be declared as volatile. volatile unsigned int unRead; volatile unsigned int unWrite; void CDoubleBuffer::Write(void* pBuf, unsigned int unBytesTo) { //unsigned int unTryWrite = (unWrite++)%2; unsigned int unTryWrite = (unWrite+1)%2; while( unRead == unTryWrite ) { ::Sleep(10); } ::MoveMemory( pAlloc[unTryWrite], pBuf, __min(unBytesTo,unSize) ); unWrite = unTryWrite; } void CDoubleBuffer::Read(void* pBuf, unsigned int unBytesFrom) { while( unRead == unWrite ) { ::Sleep(10); } //unsigned int unTryRead = (unRead++)%2; unsigned int unTryRead = (unRead+1)%2; ::MoveMemory(pBuf, pAlloc[unTryRead], __min(unBytesFrom,unSize)); unRead = unTryRead; } If any error, please make me know.Thanks. Good job.

The world is fine.

federico strati

Answers: 1. it is necessary to declare the variables volatile on some systems to prevent using cached values, you'll be fine by declaring all the integer indexes unRead/unWrite etc... as volatile 2. the classes are thread safe for the single situation where you have a single producer (writer) and a single consumer (reader) in different threads. It is NOT safe for multiple consumers / multiple producers. 3. The two functions IncTheReadPtr(void) and IncTheWritePtr(void) are there only if you want to skip some entries in the circular buffer in reading or writing for some particular reason you know. Hope that helps Cheers Federico

federico strati

Note: 1. it is necessary to declare the variables volatile on some systems to prevent using cached values, you'll be fine by declaring all the integer indexes unRead/unWrite etc... as volatile 2. the classes are thread safe for the single situation where you have a single producer (writer) and a single consumer (reader) in different threads. It is NOT safe for multiple consumers / multiple producers. 3. The modifs you've done are good even if don't understand why you changed the way you increment the pointers :) it is not important to increment pointers in an atomic manner as there will be only two threads: one writer and one reader. If you want atomicity then you shall use InterlockedIncrement / Decrement functions. But it is a wasted effort here. Hope that helps Cheers Federico

SAMZCN

Hello Federico, Thanks for your great help. It makes me clear on the problems I met recently. Many many thanks for your kind help. :-D Sam/Br.

federico.strati wrote:

Note: 1. it is necessary to declare the variables volatile on some systems to prevent using cached values, you'll be fine by declaring all the integer indexes unRead/unWrite etc... as volatile 2. the classes are thread safe for the single situation where you have a single producer (writer) and a single consumer (reader) in different threads. It is NOT safe for multiple consumers / multiple producers. 3. The modifs you've done are good even if don't understand why you changed the way you increment the pointers it is not important to increment pointers in an atomic manner as there will be only two threads: one writer and one reader. If you want atomicity then you shall use InterlockedIncrement / Decrement functions. But it is a wasted effort here. Hope that helps Cheers Federico

Rick York

I did something almost exactly like this in a different context and it worked fine. In my case the buffers were rather large and took a while to fill so it was easy to manage them but your case might be different. You may need to have more than one buffer if the processing thread is not able to always keep up and you don't want the capture thread to wait so you may want to implement your code to support N buffers and then determine what the optimum value of N is later during testing. It may be 2 or it could be 200 or it could vary widely depending on many factors. I don't know nearly enough about what you are doing to predict. Good Luck. BTW - I read further replies in this thread and I want to clarify one thing. My implementation has multiple producers (max of about 10) and one consumer and each producer has a double buffer. I could have had one consumer for each producer but the buffers are very big and fill so slowly that this was unnecessary. My application is not a web server but its scaling requirements are very well known to us.

SAMZC

Hi Rick, Thanks for your inputs. It's very helpful. Yes. You are right. After furthur thinking on my context of 1-1 producer-consumer, I agree that I may need more buffer slots in the cache line. If incoming data is huge, the processing thread is not always able to keep up with. To avoid data lost, more buffer slot to hold the data is a good implementation in this situation. Eventually, the consumer thread will dispatch the data to different task thread do logical processing. So actually, in my context it is a variant 1-1 p/c model. The final consumer is the task threads which are responsible for logical analysis and processing. Producer thread feeds data to front line consumer which is a dispatching thread to dispatch incoming data to more consumers. And I want to use one double buffer between the producer and each consumer in front and end. Thanks again. Sam/Br.

Rick York wrote:

I did something almost exactly like this in a different context and it worked fine. In my case the buffers were rather large and took a while to fill so it was easy to manage them but your case might be different. You may need to have more than one buffer if the processing thread is not able to always keep up and you don't want the capture thread to wait so you may want to implement your code to support N buffers and then determine what the optimum value of N is later during testing. It may be 2 or it could be 200 or it could vary widely depending on many factors. I don't know nearly enough about what you are doing to predict. Good Luck. BTW - I read further replies in this thread and I want to clarify one thing. My implementation has multiple producers (max of about 10) and one consumer and each producer has a double buffer. I could have had one consumer for each producer but the buffers are very big and fill so slowly that this was unnecessary. My application is not a web server but its scaling requirements are very well known to us.

The world is fine.

federico strati

here you find the revised version of double buffer, there were errors in my code, sorry! -----------

void CDoubleBuffer::Write(void* pBuf, unsigned int unBytesTo)
{
unsigned int unTryWrite = (unWrite+1)%2;
while( unRead == unTryWrite ) { ::Sleep(10); }
// use current unWrite !
::CopyMemory( pAlloc[unWrite], pBuf, __min(unBytesTo,unSize) );
unWrite = unTryWrite;
}

void CDoubleBuffer::Read(void* pBuf, unsigned int unBytesFrom)
{
while( unRead == unWrite ) { ::Sleep(10); }
// use current unRead !
::CopyMemory(pBuf, pAlloc[unRead], __min(unBytesFrom,unSize));
unRead = (unRead+1)%2;
}

-----------