Smallest and fastest way to store numeric data in a file
-
I am working with Digital Elevation Models, and some of these are very large and very slow to access. The files currently are in text format 10,000 rows and 8000 columns , and I was considering that as all items the file contains are positive numbers (ranging from 0 being sealevel to 4000 being highest peak) there must be a much better format to store these for file size and access speed. What would you suggest as the best file format for this -two bytes for each number, or singles, or doubles or....? The key thing is fast access to the array of data in the file. Thanks for any suggestions.
-
I am working with Digital Elevation Models, and some of these are very large and very slow to access. The files currently are in text format 10,000 rows and 8000 columns , and I was considering that as all items the file contains are positive numbers (ranging from 0 being sealevel to 4000 being highest peak) there must be a much better format to store these for file size and access speed. What would you suggest as the best file format for this -two bytes for each number, or singles, or doubles or....? The key thing is fast access to the array of data in the file. Thanks for any suggestions.
Hi, a binary file will provide access much much faster than anything text-oriented. use BinaryWriter/BinaryReader for this. Warnings with binary files: - you are responsible for consistent file contents; from the outside, it looks like just a collection of bytes, there is no way to recognize its structure; - portability is limited to systems that have the exact same data representation; e.g. x86 stores multibyte values in "little-endian" mode (least sighificant byte first), other systems may use "big-endian" hence not correctly interpret the same file. :)
Luc Pattyn [Forum Guidelines] [My Articles]
I use ListBoxes for line-oriented text, and PictureBoxes for pictures, not drawings.
modified on Friday, June 10, 2011 12:27 PM
-
I am working with Digital Elevation Models, and some of these are very large and very slow to access. The files currently are in text format 10,000 rows and 8000 columns , and I was considering that as all items the file contains are positive numbers (ranging from 0 being sealevel to 4000 being highest peak) there must be a much better format to store these for file size and access speed. What would you suggest as the best file format for this -two bytes for each number, or singles, or doubles or....? The key thing is fast access to the array of data in the file. Thanks for any suggestions.
Binary, as Luc said. And you should probably use short (Int16) values. Also, can you read and write a group of them at a time?
-
I am working with Digital Elevation Models, and some of these are very large and very slow to access. The files currently are in text format 10,000 rows and 8000 columns , and I was considering that as all items the file contains are positive numbers (ranging from 0 being sealevel to 4000 being highest peak) there must be a much better format to store these for file size and access speed. What would you suggest as the best file format for this -two bytes for each number, or singles, or doubles or....? The key thing is fast access to the array of data in the file. Thanks for any suggestions.
For the simplest code you could use two bytes for each value:
value = data[0] + data[1] * 256;
You only need 12 bits to store each value (0 to 4095), so for the smallest file size you could pack two values in three bytes: Bit usage:11111111 11112222 22222222
pack:data[0] = (byte)value1; data[1] = (byte)(((value1 >> 8) << 4) + (value2 & 15)); data[2] = (byte)(value2 >> 4);
unpack:value1 = data[0] + ((data[1] >> 4) << 8); value2 = (data[1] & 15) + (data[2] << 4);
Despite everything, the person most likely to be fooling you next is yourself.
modified on Wednesday, January 14, 2009 12:34 AM
-
I am working with Digital Elevation Models, and some of these are very large and very slow to access. The files currently are in text format 10,000 rows and 8000 columns , and I was considering that as all items the file contains are positive numbers (ranging from 0 being sealevel to 4000 being highest peak) there must be a much better format to store these for file size and access speed. What would you suggest as the best file format for this -two bytes for each number, or singles, or doubles or....? The key thing is fast access to the array of data in the file. Thanks for any suggestions.
As people have mentioned, binary packed 16 bits per datapoint. Keep in mind your raw data is around 150MB. You could have a fiddle with the Bitmap classes if you are uncomfortable with packing data yourself - treating them as a heightmap. May give performance benefits if GDI doesnt have a problem with the image size.
Mark Churchill Director, Dunn & Churchill Pty Ltd Free Download: Diamond Binding: The simple, powerful, reliable, and effective data layer toolkit for Visual Studio.
Entanglar: .Net game engine featuring automatic networking and powerful HLSL gfx binding. -
As people have mentioned, binary packed 16 bits per datapoint. Keep in mind your raw data is around 150MB. You could have a fiddle with the Bitmap classes if you are uncomfortable with packing data yourself - treating them as a heightmap. May give performance benefits if GDI doesnt have a problem with the image size.
Mark Churchill Director, Dunn & Churchill Pty Ltd Free Download: Diamond Binding: The simple, powerful, reliable, and effective data layer toolkit for Visual Studio.
Entanglar: .Net game engine featuring automatic networking and powerful HLSL gfx binding.Thanks all for the suggestions, the bitmap idea could be very helpful as it would maintain the structure of the file. I don't know if the code to get a pixel value from a bitmap is efficient. As the file only has to be created once write speed is not relevent, but read time critical.
-
I am working with Digital Elevation Models, and some of these are very large and very slow to access. The files currently are in text format 10,000 rows and 8000 columns , and I was considering that as all items the file contains are positive numbers (ranging from 0 being sealevel to 4000 being highest peak) there must be a much better format to store these for file size and access speed. What would you suggest as the best file format for this -two bytes for each number, or singles, or doubles or....? The key thing is fast access to the array of data in the file. Thanks for any suggestions.
Reanalyse wrote:
Smallest and fastest way
Usually the two requirements go in the opposite direction. :)
If the Lord God Almighty had consulted me before embarking upon the Creation, I would have recommended something simpler. -- Alfonso the Wise, 13th Century King of Castile.
This is going on my arrogant assumptions. You may have a superb reason why I'm completely wrong. -- Iain Clarke
[My articles]