Skip to content
  • Categories
  • Recent
  • Tags
  • Popular
  • World
  • Users
  • Groups
Skins
  • Light
  • Cerulean
  • Cosmo
  • Flatly
  • Journal
  • Litera
  • Lumen
  • Lux
  • Materia
  • Minty
  • Morph
  • Pulse
  • Sandstone
  • Simplex
  • Sketchy
  • Spacelab
  • United
  • Yeti
  • Zephyr
  • Dark
  • Cyborg
  • Darkly
  • Quartz
  • Slate
  • Solar
  • Superhero
  • Vapor

  • Default (No Skin)
  • No Skin
Collapse
Code Project
  1. Home
  2. General Programming
  3. C#
  4. How to retrieve non fixed-length records from a binary file

How to retrieve non fixed-length records from a binary file

Scheduled Pinned Locked Moved C#
csharpjsontutorial
8 Posts 3 Posters 0 Views 1 Watching
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • H Offline
    H Offline
    htres
    wrote on last edited by
    #1

    Hello, I'm new to programming and C#, so please bear with my ignorance! I need to extract jpeg images and header data from a binary file. The binary file is formatted with several fixed length fields containing information about the jpeg image, followed by the jpeg itself, followed by more header data, another jpeg, etc... Using a FileStream and BinaryReader I am able to read and store the metadata, because I know the length of the fields, but I am stumped on how to read and store the jpeg bytes since they vary in size. There is a fixed record delimiter between each header data/jpeg record, so I was thinking of using that to break apart the records. Once they are seperated and the header fields read, I could just assume the rest is the jpeg and store that. I'm not sure how to go about doing that though. Any suggestions or demo code is greatly appreciated! Thanks!

    L D 2 Replies Last reply
    0
    • H htres

      Hello, I'm new to programming and C#, so please bear with my ignorance! I need to extract jpeg images and header data from a binary file. The binary file is formatted with several fixed length fields containing information about the jpeg image, followed by the jpeg itself, followed by more header data, another jpeg, etc... Using a FileStream and BinaryReader I am able to read and store the metadata, because I know the length of the fields, but I am stumped on how to read and store the jpeg bytes since they vary in size. There is a fixed record delimiter between each header data/jpeg record, so I was thinking of using that to break apart the records. Once they are seperated and the header fields read, I could just assume the rest is the jpeg and store that. I'm not sure how to go about doing that though. Any suggestions or demo code is greatly appreciated! Thanks!

      L Offline
      L Offline
      Luc Pattyn
      wrote on last edited by
      #2

      Hi, assuming you are in control of the file format, this is what I suggest: prefix each record with a byte indicating the record type, and terminate the file with yet another record type (I suggest a zero byte here). So a file would look like this: type0 record0 type1 record1 type2 record2 ... typeN recordN 0 Now each record could correspond to a C# struct, and that struct could contain a Save(stream) method to append the struct as a record to the file stream (dont forget the type byte!), and a Load(stream) method to create and populate a struct by reading from the file stream (starting after the type byte). Now loading the file stream would consist of a loop containing - read the byte that tells the record type - use a switch to call the right struct's Load method - if end-code, close stream If you cant follow the above scheme (e.g. because the file format has been fixed and does not include a type byte), then you need to determine the type of the next record by reading and analyzing some bytes, then rewind a bit (using Seek method or Position property) and Load a record; repeat until done. Hope this helps. :)

      Luc Pattyn [My Articles]

      H 1 Reply Last reply
      0
      • H htres

        Hello, I'm new to programming and C#, so please bear with my ignorance! I need to extract jpeg images and header data from a binary file. The binary file is formatted with several fixed length fields containing information about the jpeg image, followed by the jpeg itself, followed by more header data, another jpeg, etc... Using a FileStream and BinaryReader I am able to read and store the metadata, because I know the length of the fields, but I am stumped on how to read and store the jpeg bytes since they vary in size. There is a fixed record delimiter between each header data/jpeg record, so I was thinking of using that to break apart the records. Once they are seperated and the header fields read, I could just assume the rest is the jpeg and store that. I'm not sure how to go about doing that though. Any suggestions or demo code is greatly appreciated! Thanks!

        D Offline
        D Offline
        DavidAtAscent
        wrote on last edited by
        #3

        If you have control over the binary file format, this may help. Make sure that one of the attributes in the fixed length header is the length of the following JPEG image data. You could then create a byte array (byte[]) of the length of the JPEG image and read the specified number of bytes to the byte array using: BinaryReader.Read(byte[], int index, int length)

        1 Reply Last reply
        0
        • L Luc Pattyn

          Hi, assuming you are in control of the file format, this is what I suggest: prefix each record with a byte indicating the record type, and terminate the file with yet another record type (I suggest a zero byte here). So a file would look like this: type0 record0 type1 record1 type2 record2 ... typeN recordN 0 Now each record could correspond to a C# struct, and that struct could contain a Save(stream) method to append the struct as a record to the file stream (dont forget the type byte!), and a Load(stream) method to create and populate a struct by reading from the file stream (starting after the type byte). Now loading the file stream would consist of a loop containing - read the byte that tells the record type - use a switch to call the right struct's Load method - if end-code, close stream If you cant follow the above scheme (e.g. because the file format has been fixed and does not include a type byte), then you need to determine the type of the next record by reading and analyzing some bytes, then rewind a bit (using Seek method or Position property) and Load a record; repeat until done. Hope this helps. :)

          Luc Pattyn [My Articles]

          H Offline
          H Offline
          htres
          wrote on last edited by
          #4

          Thanks for the suggestions! Unfortunately, I am receiving this binary file from a third party and it's format is out of my control. :( The fields in the records seem to be seperated by a zero byte, and the records are seperated by a 16 byte string. Also, I noticed that the jpeg data starts with the bytes (in HEX) FF, D8, FF, E0 and ends with FF, D9. Could I possibly use these byte sequences to identify the jpeg? Also, I'm very new to this so all I have figured out how to do so far in code is to read my fixed length fields like so: FileStream fs = File.OpenRead(strFileName); BinaryReader reader = new BinaryReader(fs); //reads the first 36 bytes of trash reader.ReadBytes(36); //reads and stores the record delimiter string string strSignature = Encoding.ASCII.GetString(byteSignature); //advances the curser 1 byte reader.ReadBytes(1); //stores SKS ID string strSKSID = Encoding.ASCII.GetString(reader.ReadBytes(16)); //advances the curser 1 byte etc... reading down until I get to the image field.

          Luc Pattyn wrote:

          If you cant follow the above scheme (e.g. because the file format has been fixed and does not include a type byte), then you need to determine the type of the next record by reading and analyzing some bytes, then rewind a bit (using Seek method or Position property) and Load a record; repeat until done.

          I'm not sure how to actually implement your suggestion of reading and analyzing some bytes, then rewinding. Could you provide some example code? Thanks again!

          L 1 Reply Last reply
          0
          • H htres

            Thanks for the suggestions! Unfortunately, I am receiving this binary file from a third party and it's format is out of my control. :( The fields in the records seem to be seperated by a zero byte, and the records are seperated by a 16 byte string. Also, I noticed that the jpeg data starts with the bytes (in HEX) FF, D8, FF, E0 and ends with FF, D9. Could I possibly use these byte sequences to identify the jpeg? Also, I'm very new to this so all I have figured out how to do so far in code is to read my fixed length fields like so: FileStream fs = File.OpenRead(strFileName); BinaryReader reader = new BinaryReader(fs); //reads the first 36 bytes of trash reader.ReadBytes(36); //reads and stores the record delimiter string string strSignature = Encoding.ASCII.GetString(byteSignature); //advances the curser 1 byte reader.ReadBytes(1); //stores SKS ID string strSKSID = Encoding.ASCII.GetString(reader.ReadBytes(16)); //advances the curser 1 byte etc... reading down until I get to the image field.

            Luc Pattyn wrote:

            If you cant follow the above scheme (e.g. because the file format has been fixed and does not include a type byte), then you need to determine the type of the next record by reading and analyzing some bytes, then rewind a bit (using Seek method or Position property) and Load a record; repeat until done.

            I'm not sure how to actually implement your suggestion of reading and analyzing some bytes, then rewinding. Could you provide some example code? Thanks again!

            L Offline
            L Offline
            Luc Pattyn
            wrote on last edited by
            #5

            Hi, since you dont control the file format, here are the fundamentals you will need, plus some suggestions:

            - use one FileStream for your file

            • use BinaryReader.ReadBytes() to read a number of bytes at the current position (it will
              advance the current position); problem here is you must specify the byte count
            • create a number of classes or structs, one for each possible record type.
            • if class/struct RecordType1 is one of the possible record types, you should give it
              two static methods:
              bool Accept(FileStream) would read some bytes and decide whether or not the data fits the
              record type for that class/struct; it should restore the filestream position as if
              nothing happened (use FileStream.Position property to remember where you are in the file,
              and to return to that position); it should not throw exceptions to the caller.
              RecordType1 Load(FileStream) would read all the bytes needed to load a record of that type,
              knowing it is of that type (since it will have been Accepted beforehand). Load does
              advance the filestream, so it consumes the record and returns the result. It should throw
              exceptions when something fails.
            • details on Accept: you can try and recognize the first few bytes; JPEG always start
              with FF D8 and often with FF D8 FF E0; but nothing prevents other (non JPEG) records
              to also start with FF D8 !! So your collection of Accept() methods should be sufficiently
              accurate to discern the record types at hand.
            • details on Load: you should know the byte count in order to read the right number
              of bytes; scanning for an end marker is difficult: even if JPEG always ends on FF D9,
              that does not mean the first FF D9 is the end of a JPEG (it could be a bit pattern
              in the middle of the pixel info).
            • it is rather hard to decode JPEG, so I suggest to let GDI+ try and decode a JPEG image.
              One way would be to create a memory stream from your byte array, then call
              Image.FromStream(MemoryStream), but I suspect you could directly call
              Image.FromStream(FileStream) avoiding the byte count problem completely.
            • you can create a new BinaryReader in every Accept and every Load method in each
              RecordType class/struct, or reuse a single one all over the place (dont try something
              intermediate).
            • also provide a class/struct to handle the end-of-file record; it needs an Accept
              but does not need a Load() method.
            • and now the finale: put all your Accept and Load methods in one loop to decode the
              entire file, as in:

            try {
            FileStream fs=...
            for (

            H 1 Reply Last reply
            0
            • L Luc Pattyn

              Hi, since you dont control the file format, here are the fundamentals you will need, plus some suggestions:

              - use one FileStream for your file

              • use BinaryReader.ReadBytes() to read a number of bytes at the current position (it will
                advance the current position); problem here is you must specify the byte count
              • create a number of classes or structs, one for each possible record type.
              • if class/struct RecordType1 is one of the possible record types, you should give it
                two static methods:
                bool Accept(FileStream) would read some bytes and decide whether or not the data fits the
                record type for that class/struct; it should restore the filestream position as if
                nothing happened (use FileStream.Position property to remember where you are in the file,
                and to return to that position); it should not throw exceptions to the caller.
                RecordType1 Load(FileStream) would read all the bytes needed to load a record of that type,
                knowing it is of that type (since it will have been Accepted beforehand). Load does
                advance the filestream, so it consumes the record and returns the result. It should throw
                exceptions when something fails.
              • details on Accept: you can try and recognize the first few bytes; JPEG always start
                with FF D8 and often with FF D8 FF E0; but nothing prevents other (non JPEG) records
                to also start with FF D8 !! So your collection of Accept() methods should be sufficiently
                accurate to discern the record types at hand.
              • details on Load: you should know the byte count in order to read the right number
                of bytes; scanning for an end marker is difficult: even if JPEG always ends on FF D9,
                that does not mean the first FF D9 is the end of a JPEG (it could be a bit pattern
                in the middle of the pixel info).
              • it is rather hard to decode JPEG, so I suggest to let GDI+ try and decode a JPEG image.
                One way would be to create a memory stream from your byte array, then call
                Image.FromStream(MemoryStream), but I suspect you could directly call
                Image.FromStream(FileStream) avoiding the byte count problem completely.
              • you can create a new BinaryReader in every Accept and every Load method in each
                RecordType class/struct, or reuse a single one all over the place (dont try something
                intermediate).
              • also provide a class/struct to handle the end-of-file record; it needs an Accept
                but does not need a Load() method.
              • and now the finale: put all your Accept and Load methods in one loop to decode the
                entire file, as in:

              try {
              FileStream fs=...
              for (

              H Offline
              H Offline
              htres
              wrote on last edited by
              #6

              Well, I was finally able to do this in a pretty efficient way. I just read in the entire file, about 256k, to a byte array. Then I could convert it to a string in order to use string.IndexOf to find the record delimiters. I then used those demlimiter positions and Array.Copy to copy what I wanted out of the original byte[] to it's own byte[]. From there it was easy to get the image because each record has a fixed 223 byte header, so the remainder of the record had to be the embedded image. I just copied what was left of the record after the first 223 bytes to another byte[] and wrote it to disk, named it .jpg, and tada, I had the jpeg image!

              L 1 Reply Last reply
              0
              • H htres

                Well, I was finally able to do this in a pretty efficient way. I just read in the entire file, about 256k, to a byte array. Then I could convert it to a string in order to use string.IndexOf to find the record delimiters. I then used those demlimiter positions and Array.Copy to copy what I wanted out of the original byte[] to it's own byte[]. From there it was easy to get the image because each record has a fixed 223 byte header, so the remainder of the record had to be the embedded image. I just copied what was left of the record after the first 223 bytes to another byte[] and wrote it to disk, named it .jpg, and tada, I had the jpeg image!

                L Offline
                L Offline
                Luc Pattyn
                wrote on last edited by
                #7

                Hi, I'm glad you got something working. I would not fully trust the string.IndexOf part, since string operations perform unpredictably on non-string data (such as JPEG images, which can contain any bit pattern, that could be misinterpreted as Unicode characters).

                htres wrote:

                each record has a fixed 223 byte header

                That's new info, makes things easier I guess.

                htres wrote:

                wrote it to disk, named it .jpg, and tada, I had the jpeg image

                As I mentioned earlier if you want the image I guess you can do it without such file using Image.FromStream(); if you need the file, then it is the way to go. :)

                Luc Pattyn [My Articles]

                H 1 Reply Last reply
                0
                • L Luc Pattyn

                  Hi, I'm glad you got something working. I would not fully trust the string.IndexOf part, since string operations perform unpredictably on non-string data (such as JPEG images, which can contain any bit pattern, that could be misinterpreted as Unicode characters).

                  htres wrote:

                  each record has a fixed 223 byte header

                  That's new info, makes things easier I guess.

                  htres wrote:

                  wrote it to disk, named it .jpg, and tada, I had the jpeg image

                  As I mentioned earlier if you want the image I guess you can do it without such file using Image.FromStream(); if you need the file, then it is the way to go. :)

                  Luc Pattyn [My Articles]

                  H Offline
                  H Offline
                  htres
                  wrote on last edited by
                  #8

                  Luc Pattyn wrote:

                  I would not fully trust the string.IndexOf part, since string operations perform unpredictably on non-string data (such as JPEG images, which can contain any bit pattern, that could be misinterpreted as Unicode characters).

                  You are right, I had a lot of trouble with string.IndexOf when I was trying to isolate just the jpeg by searching for small strings of 2-4 chars. But after a lot of testing using it to find the record delimiter, which is the same 16 byte string in every record, it works very reliably. Even though it is entirely possible for this particular 16 byte string to show up within the jpeg encoding, the odds are against it.

                  Luc Pattyn wrote:

                  htres wrote: each record has a fixed 223 byte header That's new info, makes things easier I guess.

                  Yeah it was a lot easier. That 223 byte header contained the fixed length fields that held the information about the image. I probably should have posted an example of the file format...but it would have been ugly since it is mostly binary.

                  Luc Pattyn wrote:

                  htres wrote: wrote it to disk, named it .jpg, and tada, I had the jpeg image As I mentioned earlier if you want the image I guess you can do it without such file using Image.FromStream(); if you need the file, then it is the way to go.

                  I haven't tried the Image.FromStream option yet, though I plan to eventually. Ultimately I'd like to populate a database with the picture and header data as well. But I'll leave that part for a new thread... ;) Thanks for the help!

                  1 Reply Last reply
                  0
                  Reply
                  • Reply as topic
                  Log in to reply
                  • Oldest to Newest
                  • Newest to Oldest
                  • Most Votes


                  • Login

                  • Don't have an account? Register

                  • Login or register to search.
                  • First post
                    Last post
                  0
                  • Categories
                  • Recent
                  • Tags
                  • Popular
                  • World
                  • Users
                  • Groups