Socket message(frame) pattern matching
-
In a socket application guess we define a protocol for messages like this:
+----------+----------------------+----------+
| DataSize | Data(Not Fixed size) | Checksum |
+----------+----------------------+----------+In a socket server I receive an array of bytes and need to: - Finds frames that match my predefined protocol(e.g above image). Is there any efficient and clean algorithm/design-pattern for finding(and parsing) matching packets in a byte array? I don't even know which terms/phrases to search. Thank you so much in advanced.
"I hope you live a life you're proud of. If you find that you're not, I hope you have the strength to start all over again."
- I wish I knew who is this quote from -
In a socket application guess we define a protocol for messages like this:
+----------+----------------------+----------+
| DataSize | Data(Not Fixed size) | Checksum |
+----------+----------------------+----------+In a socket server I receive an array of bytes and need to: - Finds frames that match my predefined protocol(e.g above image). Is there any efficient and clean algorithm/design-pattern for finding(and parsing) matching packets in a byte array? I don't even know which terms/phrases to search. Thank you so much in advanced.
"I hope you live a life you're proud of. If you find that you're not, I hope you have the strength to start all over again."
- I wish I knew who is this quote fromHamed Mosavi wrote:
matching packets
I see nothing to match; you said the message would start with a size, so the first 1/2/4/? bytes should be aggregated into a size value (maybe
BitConverter.ToInt32
comes in handy), then that number of bytes of data are expected, then the next 1/2/4/? bytes should be aggregated into a checksum value, which when it matches the local checksum calculation will make the message acceptable, otherwise unacceptable. You may apply extra checks, such as upper/lower limits to datasize. When multiple systems (and maybe multiple implementations) are going to be used, you should carefully specify the checksum algorithm used, and the byte order ("endianness") in multi-byte values (probably size and checksum). If you need syncing capabilities (e.g. because some bytes may get lost underway), you should start with a fixed header, sometimes called an eye catcher, akin to the start bit of RS232C. Then your receiver should check the data starts with a correct header, and ignore anything that does not. :)Luc Pattyn [Forum Guidelines] [Why QA sucks] [My Articles]
I only read formatted code with indentation, so please use PRE tags for code snippets.
I'm not participating in frackin' Q&A, so if you want my opinion, ask away in a real forum (or on my profile page).
-
Hamed Mosavi wrote:
matching packets
I see nothing to match; you said the message would start with a size, so the first 1/2/4/? bytes should be aggregated into a size value (maybe
BitConverter.ToInt32
comes in handy), then that number of bytes of data are expected, then the next 1/2/4/? bytes should be aggregated into a checksum value, which when it matches the local checksum calculation will make the message acceptable, otherwise unacceptable. You may apply extra checks, such as upper/lower limits to datasize. When multiple systems (and maybe multiple implementations) are going to be used, you should carefully specify the checksum algorithm used, and the byte order ("endianness") in multi-byte values (probably size and checksum). If you need syncing capabilities (e.g. because some bytes may get lost underway), you should start with a fixed header, sometimes called an eye catcher, akin to the start bit of RS232C. Then your receiver should check the data starts with a correct header, and ignore anything that does not. :)Luc Pattyn [Forum Guidelines] [Why QA sucks] [My Articles]
I only read formatted code with indentation, so please use PRE tags for code snippets.
I'm not participating in frackin' Q&A, so if you want my opinion, ask away in a real forum (or on my profile page).
Thank you so much for your very fast reply. It's very kind of you.
Luc Pattyn wrote:
If you need syncing capabilities (e.g. because some bytes may get lost underway)
This is the exact reason for what I'm seeking. If byte array I receive contains broken message at the beginning then I need to find next packet and ignore what's before.
Luc Pattyn wrote:
you should start with a fixed heade
Is this like a beginning flag(like preamble in Ethernet frames)? If it is, It's not possible for me to use this solution since it's possible that the data field contains those(flag) byte sequence either, so message header shall be big enough to decrease such probability and in most systems that I'm working with, too much overhead is not accepted.
Luc Pattyn wrote:
If you need syncing capabilities
This looks like what I need to search. I'll take a closer look at the syncing mechanisms to see if there's any better scape way. Thank you for this help. :)
"I hope you live a life you're proud of. If you find that you're not, I hope you have the strength to start all over again."
- I wish I knew who is this quote from -
Thank you so much for your very fast reply. It's very kind of you.
Luc Pattyn wrote:
If you need syncing capabilities (e.g. because some bytes may get lost underway)
This is the exact reason for what I'm seeking. If byte array I receive contains broken message at the beginning then I need to find next packet and ignore what's before.
Luc Pattyn wrote:
you should start with a fixed heade
Is this like a beginning flag(like preamble in Ethernet frames)? If it is, It's not possible for me to use this solution since it's possible that the data field contains those(flag) byte sequence either, so message header shall be big enough to decrease such probability and in most systems that I'm working with, too much overhead is not accepted.
Luc Pattyn wrote:
If you need syncing capabilities
This looks like what I need to search. I'll take a closer look at the syncing mechanisms to see if there's any better scape way. Thank you for this help. :)
"I hope you live a life you're proud of. If you find that you're not, I hope you have the strength to start all over again."
- I wish I knew who is this quote fromWithout a fixed header, you'll have a hard time getting bits/bytes in sync, as nothing of your message is cast in stone, the only thing you have is a checksum. So all you can do is assume the message starts at byte index 0, read its length and data, and check the checksum; and when that fails, start again at index 1, etc, until something happens to match. With a header (even if it is only a single byte), you only have to investigate potential messages starting with the right byte value. Longer headers cause easier syncing at the expense of more overhead (less effective bandwidth); RS232C uses a single bit for syncing, and that too can and obviously will appear in almost every byte transmitted, but all that means is it may take several bytes to get in sync. So there is no real need for a long header, and there sure is no need to forbid the accidental appearance of a header-look-alike inside a message, as headers are only used to find the start of a message; once you (think you) are holding a message, just process the data and check the checksum. :)
Luc Pattyn [Forum Guidelines] [Why QA sucks] [My Articles]
I only read formatted code with indentation, so please use PRE tags for code snippets.
I'm not participating in frackin' Q&A, so if you want my opinion, ask away in a real forum (or on my profile page).
-
Without a fixed header, you'll have a hard time getting bits/bytes in sync, as nothing of your message is cast in stone, the only thing you have is a checksum. So all you can do is assume the message starts at byte index 0, read its length and data, and check the checksum; and when that fails, start again at index 1, etc, until something happens to match. With a header (even if it is only a single byte), you only have to investigate potential messages starting with the right byte value. Longer headers cause easier syncing at the expense of more overhead (less effective bandwidth); RS232C uses a single bit for syncing, and that too can and obviously will appear in almost every byte transmitted, but all that means is it may take several bytes to get in sync. So there is no real need for a long header, and there sure is no need to forbid the accidental appearance of a header-look-alike inside a message, as headers are only used to find the start of a message; once you (think you) are holding a message, just process the data and check the checksum. :)
Luc Pattyn [Forum Guidelines] [Why QA sucks] [My Articles]
I only read formatted code with indentation, so please use PRE tags for code snippets.
I'm not participating in frackin' Q&A, so if you want my opinion, ask away in a real forum (or on my profile page).
Yes. You're right. It's a trade off between server efficiency and data overhead. To have a good system it must be balanced, I believe. A syncing bit(or even byte) in each packet can be a great help to increase server performance. By the way, have you seen any good open source server implementation of message transmitting processing? I'll definitely learn a lot from that (to have a cleaner with better performance server.) I have written some four socket applications in last 5 to 6 years and it had always been a pain to implement this part. Thank you again Luc Pattyn for your helps. It's really appreciated.
"I hope you live a life you're proud of. If you find that you're not, I hope you have the strength to start all over again."
- I wish I knew who is this quote from -
Yes. You're right. It's a trade off between server efficiency and data overhead. To have a good system it must be balanced, I believe. A syncing bit(or even byte) in each packet can be a great help to increase server performance. By the way, have you seen any good open source server implementation of message transmitting processing? I'll definitely learn a lot from that (to have a cleaner with better performance server.) I have written some four socket applications in last 5 to 6 years and it had always been a pain to implement this part. Thank you again Luc Pattyn for your helps. It's really appreciated.
"I hope you live a life you're proud of. If you find that you're not, I hope you have the strength to start all over again."
- I wish I knew who is this quote fromYou're welcome. One thing that isn't clear to me, is why you would not (to a rather high degree) trust incoming messages? If your network is using say Ethernet, and your messages are less than 1500 bytes in length, then they would fit in a single Ethernet packet, and hence the lower network layers would deal with bad packets, the app would only get real ones, probably containing exactly one message. Things are entirely different on a serial port such as RS232C, where you may not have packets, and just inserting/removing/powercycling the peripheral may well result in a couple of spurious bytes. Or maybe you are implementing something like SLIP[^]? :)
Luc Pattyn [Forum Guidelines] [Why QA sucks] [My Articles]
I only read formatted code with indentation, so please use PRE tags for code snippets.
I'm not participating in frackin' Q&A, so if you want my opinion, ask away in a real forum (or on my profile page).
-
You're welcome. One thing that isn't clear to me, is why you would not (to a rather high degree) trust incoming messages? If your network is using say Ethernet, and your messages are less than 1500 bytes in length, then they would fit in a single Ethernet packet, and hence the lower network layers would deal with bad packets, the app would only get real ones, probably containing exactly one message. Things are entirely different on a serial port such as RS232C, where you may not have packets, and just inserting/removing/powercycling the peripheral may well result in a couple of spurious bytes. Or maybe you are implementing something like SLIP[^]? :)
Luc Pattyn [Forum Guidelines] [Why QA sucks] [My Articles]
I only read formatted code with indentation, so please use PRE tags for code snippets.
I'm not participating in frackin' Q&A, so if you want my opinion, ask away in a real forum (or on my profile page).
No it's not in a local area network. Clients are micro-controller applications which rely on cellphone GPRS to connect through internet to a remote server. They're transmitters that receive data on a serial port and need to transfer it to a remote server. More info about the amount of data and what's inside is not given to me. All I know is that GPRS and cell network in general, in the area they use it, has a very low quality. I can't risk much about reliability. Even though things are not that good in a LAN. Based on experience I've seen multiple copies of a packet or data loss and disconnects even in a LAN. It had been wireless LAN though. But the first experience was annoying. I still remember that day! I didn't know about SLIP. It looks, in some ways, similar to my project except that I'm not working that much low level. I'm working at application level, if I'm not wrong.
"I hope you live a life you're proud of. If you find that you're not, I hope you have the strength to start all over again."
- I wish I knew who is this quote from -
No it's not in a local area network. Clients are micro-controller applications which rely on cellphone GPRS to connect through internet to a remote server. They're transmitters that receive data on a serial port and need to transfer it to a remote server. More info about the amount of data and what's inside is not given to me. All I know is that GPRS and cell network in general, in the area they use it, has a very low quality. I can't risk much about reliability. Even though things are not that good in a LAN. Based on experience I've seen multiple copies of a packet or data loss and disconnects even in a LAN. It had been wireless LAN though. But the first experience was annoying. I still remember that day! I didn't know about SLIP. It looks, in some ways, similar to my project except that I'm not working that much low level. I'm working at application level, if I'm not wrong.
"I hope you live a life you're proud of. If you find that you're not, I hope you have the strength to start all over again."
- I wish I knew who is this quote fromthis Benjamin Button[^] used your quote according to this page[^]. :)
Luc Pattyn [Forum Guidelines] [Why QA sucks] [My Articles]
I only read formatted code with indentation, so please use PRE tags for code snippets.
I'm not participating in frackin' Q&A, so if you want my opinion, ask away in a real forum (or on my profile page).
-
this Benjamin Button[^] used your quote according to this page[^]. :)
Luc Pattyn [Forum Guidelines] [Why QA sucks] [My Articles]
I only read formatted code with indentation, so please use PRE tags for code snippets.
I'm not participating in frackin' Q&A, so if you want my opinion, ask away in a real forum (or on my profile page).
Yes. :-D But I still don't know who said it first. Not that It changes the beauty of the sentence but to mention her/his name under the quote.
"I hope you live a life you're proud of. If you find that you're not, I hope you have the strength to start all over again."
- I wish I knew who is this quote from