Skip to content
  • Categories
  • Recent
  • Tags
  • Popular
  • World
  • Users
  • Groups
Skins
  • Light
  • Cerulean
  • Cosmo
  • Flatly
  • Journal
  • Litera
  • Lumen
  • Lux
  • Materia
  • Minty
  • Morph
  • Pulse
  • Sandstone
  • Simplex
  • Sketchy
  • Spacelab
  • United
  • Yeti
  • Zephyr
  • Dark
  • Cyborg
  • Darkly
  • Quartz
  • Slate
  • Solar
  • Superhero
  • Vapor

  • Default (No Skin)
  • No Skin
Collapse
Code Project
  1. Home
  2. General Programming
  3. C#
  4. Smallest and fastest way to store numeric data in a file

Smallest and fastest way to store numeric data in a file

Scheduled Pinned Locked Moved C#
data-structuresperformancequestion
7 Posts 6 Posters 0 Views 1 Watching
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • R Offline
    R Offline
    Reanalyse
    wrote on last edited by
    #1

    I am working with Digital Elevation Models, and some of these are very large and very slow to access. The files currently are in text format 10,000 rows and 8000 columns , and I was considering that as all items the file contains are positive numbers (ranging from 0 being sealevel to 4000 being highest peak) there must be a much better format to store these for file size and access speed. What would you suggest as the best file format for this -two bytes for each number, or singles, or doubles or....? The key thing is fast access to the array of data in the file. Thanks for any suggestions.

    L P G M CPalliniC 5 Replies Last reply
    0
    • R Reanalyse

      I am working with Digital Elevation Models, and some of these are very large and very slow to access. The files currently are in text format 10,000 rows and 8000 columns , and I was considering that as all items the file contains are positive numbers (ranging from 0 being sealevel to 4000 being highest peak) there must be a much better format to store these for file size and access speed. What would you suggest as the best file format for this -two bytes for each number, or singles, or doubles or....? The key thing is fast access to the array of data in the file. Thanks for any suggestions.

      L Offline
      L Offline
      Luc Pattyn
      wrote on last edited by
      #2

      Hi, a binary file will provide access much much faster than anything text-oriented. use BinaryWriter/BinaryReader for this. Warnings with binary files: - you are responsible for consistent file contents; from the outside, it looks like just a collection of bytes, there is no way to recognize its structure; - portability is limited to systems that have the exact same data representation; e.g. x86 stores multibyte values in "little-endian" mode (least sighificant byte first), other systems may use "big-endian" hence not correctly interpret the same file. :)

      Luc Pattyn [Forum Guidelines] [My Articles]


      I use ListBoxes for line-oriented text, and PictureBoxes for pictures, not drawings.


      modified on Friday, June 10, 2011 12:27 PM

      1 Reply Last reply
      0
      • R Reanalyse

        I am working with Digital Elevation Models, and some of these are very large and very slow to access. The files currently are in text format 10,000 rows and 8000 columns , and I was considering that as all items the file contains are positive numbers (ranging from 0 being sealevel to 4000 being highest peak) there must be a much better format to store these for file size and access speed. What would you suggest as the best file format for this -two bytes for each number, or singles, or doubles or....? The key thing is fast access to the array of data in the file. Thanks for any suggestions.

        P Offline
        P Offline
        PIEBALDconsult
        wrote on last edited by
        #3

        Binary, as Luc said. And you should probably use short (Int16) values. Also, can you read and write a group of them at a time?

        1 Reply Last reply
        0
        • R Reanalyse

          I am working with Digital Elevation Models, and some of these are very large and very slow to access. The files currently are in text format 10,000 rows and 8000 columns , and I was considering that as all items the file contains are positive numbers (ranging from 0 being sealevel to 4000 being highest peak) there must be a much better format to store these for file size and access speed. What would you suggest as the best file format for this -two bytes for each number, or singles, or doubles or....? The key thing is fast access to the array of data in the file. Thanks for any suggestions.

          G Offline
          G Offline
          Guffa
          wrote on last edited by
          #4

          For the simplest code you could use two bytes for each value: value = data[0] + data[1] * 256; You only need 12 bits to store each value (0 to 4095), so for the smallest file size you could pack two values in three bytes: Bit usage: 11111111 11112222 22222222 pack: data[0] = (byte)value1; data[1] = (byte)(((value1 >> 8) << 4) + (value2 & 15)); data[2] = (byte)(value2 >> 4); unpack: value1 = data[0] + ((data[1] >> 4) << 8); value2 = (data[1] & 15) + (data[2] << 4);

          Despite everything, the person most likely to be fooling you next is yourself.

          modified on Wednesday, January 14, 2009 12:34 AM

          1 Reply Last reply
          0
          • R Reanalyse

            I am working with Digital Elevation Models, and some of these are very large and very slow to access. The files currently are in text format 10,000 rows and 8000 columns , and I was considering that as all items the file contains are positive numbers (ranging from 0 being sealevel to 4000 being highest peak) there must be a much better format to store these for file size and access speed. What would you suggest as the best file format for this -two bytes for each number, or singles, or doubles or....? The key thing is fast access to the array of data in the file. Thanks for any suggestions.

            M Offline
            M Offline
            Mark Churchill
            wrote on last edited by
            #5

            As people have mentioned, binary packed 16 bits per datapoint. Keep in mind your raw data is around 150MB. You could have a fiddle with the Bitmap classes if you are uncomfortable with packing data yourself - treating them as a heightmap. May give performance benefits if GDI doesnt have a problem with the image size.

            Mark Churchill Director, Dunn & Churchill Pty Ltd Free Download: Diamond Binding: The simple, powerful, reliable, and effective data layer toolkit for Visual Studio.
            Entanglar: .Net game engine featuring automatic networking and powerful HLSL gfx binding.

            R 1 Reply Last reply
            0
            • M Mark Churchill

              As people have mentioned, binary packed 16 bits per datapoint. Keep in mind your raw data is around 150MB. You could have a fiddle with the Bitmap classes if you are uncomfortable with packing data yourself - treating them as a heightmap. May give performance benefits if GDI doesnt have a problem with the image size.

              Mark Churchill Director, Dunn & Churchill Pty Ltd Free Download: Diamond Binding: The simple, powerful, reliable, and effective data layer toolkit for Visual Studio.
              Entanglar: .Net game engine featuring automatic networking and powerful HLSL gfx binding.

              R Offline
              R Offline
              Reanalyse
              wrote on last edited by
              #6

              Thanks all for the suggestions, the bitmap idea could be very helpful as it would maintain the structure of the file. I don't know if the code to get a pixel value from a bitmap is efficient. As the file only has to be created once write speed is not relevent, but read time critical.

              1 Reply Last reply
              0
              • R Reanalyse

                I am working with Digital Elevation Models, and some of these are very large and very slow to access. The files currently are in text format 10,000 rows and 8000 columns , and I was considering that as all items the file contains are positive numbers (ranging from 0 being sealevel to 4000 being highest peak) there must be a much better format to store these for file size and access speed. What would you suggest as the best file format for this -two bytes for each number, or singles, or doubles or....? The key thing is fast access to the array of data in the file. Thanks for any suggestions.

                CPalliniC Offline
                CPalliniC Offline
                CPallini
                wrote on last edited by
                #7

                Reanalyse wrote:

                Smallest and fastest way

                Usually the two requirements go in the opposite direction. :)

                If the Lord God Almighty had consulted me before embarking upon the Creation, I would have recommended something simpler. -- Alfonso the Wise, 13th Century King of Castile.
                This is going on my arrogant assumptions. You may have a superb reason why I'm completely wrong. -- Iain Clarke
                [My articles]

                In testa che avete, signor di Ceprano?

                1 Reply Last reply
                0
                Reply
                • Reply as topic
                Log in to reply
                • Oldest to Newest
                • Newest to Oldest
                • Most Votes


                • Login

                • Don't have an account? Register

                • Login or register to search.
                • First post
                  Last post
                0
                • Categories
                • Recent
                • Tags
                • Popular
                • World
                • Users
                • Groups