How memory efficient is std::set?
-
From the 'I feel I ought to know the answer but I don't' dept. I need to create a large array of UDT's and I'm short on space. Using std::set would be a no-brainer if I could work out (and justify) how much extra space it'll need at run-time compared with a hand crafted array. I realise I don't even know how to measure it at run time! Any hints and tips would be welcome.:confused:
Dave Cross
That all depends on the type of efficiency you are looking for. If you want memory efficiency, vector or deque is better (deque being better than vector in that case as well). Also, if you need random access capability, you should stick with one of those two collections. If you want fast sorting efficiency, set is a good choice since it sorts elements as they are inserted. Keep in mind that set also requires elements to be unique, so if you need to have elements that might compare to be the same (e.g. a set of integers with 2 5's), you will need to use multiset or another collection like vector, deque or list. Basically, it depends on what you are doing, what type of data you have, and how you intend to use that data.
If you decide to become a software engineer, you are signing up to have a 1/2" piece of silicon tell you exactly how stupid you really are for 8 hours a day, 5 days a week Zac
-
From the 'I feel I ought to know the answer but I don't' dept. I need to create a large array of UDT's and I'm short on space. Using std::set would be a no-brainer if I could work out (and justify) how much extra space it'll need at run-time compared with a hand crafted array. I realise I don't even know how to measure it at run time! Any hints and tips would be welcome.:confused:
Dave Cross
std::set is normally implemented with balanced binary search trees (AVL or RB). That gives you O(log(N)) for deletion, insertion and lookup of single nodes. Memory allocation is O(N). If you need to measure the memory overhead introduced by any STL container, you can always create your own allocator which just forwards calls to new and delete. In the same call, you can log the memory requested, and compare it against memory needed for your data.
-- In Hypno-Vision
-
From the 'I feel I ought to know the answer but I don't' dept. I need to create a large array of UDT's and I'm short on space. Using std::set would be a no-brainer if I could work out (and justify) how much extra space it'll need at run-time compared with a hand crafted array. I realise I don't even know how to measure it at run time! Any hints and tips would be welcome.:confused:
Dave Cross
-
From the 'I feel I ought to know the answer but I don't' dept. I need to create a large array of UDT's and I'm short on space. Using std::set would be a no-brainer if I could work out (and justify) how much extra space it'll need at run-time compared with a hand crafted array. I realise I don't even know how to measure it at run time! Any hints and tips would be welcome.:confused:
Dave Cross
If you can allocate it all at once,
std::vector
is the most memory efficient - it allocates a single contiguous block of memory and will have something like 12-16 bytes overhead on top (pointer to allocated memory, size of allocated block, use count of allocated block). If the vector has to reallocate, though, you will have high memory usage while the vector allocates a new (larger) contiguous block and copies elements to it.std::deque
is second in terms of overhead. It allocates memory in large-ish chunks, so if you need more elements, it just allocates a new chunk. Obviously, it needs to keep track of each chunk, so you're likely to have 12-16 bytes overhead per-chunk, possibly (I'm guessing here!).std::set
- mmmm, you don't really want to use that unless you need to utilise its particular properties, i.e. it's an ordered set and it's pretty efficient for checking for the existence of a particular element (given the ordering function you specify). Even then - it's probably more performant to sort astd::vector
and one of the algorithms (std::lower_bound
to get an iterator somewhere near the element, orstd::binary_search
to show the existence (but not the location!) of the element) BTW - you may find Effective STL by Scott Meyers[^] useful for STL related hints and tips - tip number 1 is "Choose your containers with care." - hmmm, sounds relevant :-) HTH! -
If you can allocate it all at once,
std::vector
is the most memory efficient - it allocates a single contiguous block of memory and will have something like 12-16 bytes overhead on top (pointer to allocated memory, size of allocated block, use count of allocated block). If the vector has to reallocate, though, you will have high memory usage while the vector allocates a new (larger) contiguous block and copies elements to it.std::deque
is second in terms of overhead. It allocates memory in large-ish chunks, so if you need more elements, it just allocates a new chunk. Obviously, it needs to keep track of each chunk, so you're likely to have 12-16 bytes overhead per-chunk, possibly (I'm guessing here!).std::set
- mmmm, you don't really want to use that unless you need to utilise its particular properties, i.e. it's an ordered set and it's pretty efficient for checking for the existence of a particular element (given the ordering function you specify). Even then - it's probably more performant to sort astd::vector
and one of the algorithms (std::lower_bound
to get an iterator somewhere near the element, orstd::binary_search
to show the existence (but not the location!) of the element) BTW - you may find Effective STL by Scott Meyers[^] useful for STL related hints and tips - tip number 1 is "Choose your containers with care." - hmmm, sounds relevant :-) HTH!Stuart Dootson wrote:
Even then - it's probably more performant to sort a std::vector and one of the algorithms (std::lower_bound to get an iterator somewhere near the element, or std::binary_search to show the existence (but not the location!) of the element)
Not if the vector changes frequently (size and values)!
Stuart Dootson wrote:
tip number 1 is "Choose your containers with care." - hmmm, sounds relevant
I'd say it's relevant even at times when one think it's not. :-D
-- Mit viel Oktan und frei von Blei, eine Kraftstoff wie Benziiiiiiin!
-
If you can allocate it all at once,
std::vector
is the most memory efficient - it allocates a single contiguous block of memory and will have something like 12-16 bytes overhead on top (pointer to allocated memory, size of allocated block, use count of allocated block). If the vector has to reallocate, though, you will have high memory usage while the vector allocates a new (larger) contiguous block and copies elements to it.std::deque
is second in terms of overhead. It allocates memory in large-ish chunks, so if you need more elements, it just allocates a new chunk. Obviously, it needs to keep track of each chunk, so you're likely to have 12-16 bytes overhead per-chunk, possibly (I'm guessing here!).std::set
- mmmm, you don't really want to use that unless you need to utilise its particular properties, i.e. it's an ordered set and it's pretty efficient for checking for the existence of a particular element (given the ordering function you specify). Even then - it's probably more performant to sort astd::vector
and one of the algorithms (std::lower_bound
to get an iterator somewhere near the element, orstd::binary_search
to show the existence (but not the location!) of the element) BTW - you may find Effective STL by Scott Meyers[^] useful for STL related hints and tips - tip number 1 is "Choose your containers with care." - hmmm, sounds relevant :-) HTH!Thanks for that. I have the book and I know the relative merits of the containers. I need the functionality of std::set but I could code around the limitations of vector or deque if it was worth while. My question is really how to tell, in a given situation, exactly how much extra memory std::set uses compared with, say, std::vector? I can knock up a little test program to dynamically allocate 10,000 items using each method but how do I tell how much space is allocated to the whole collection? I need something like run-time_sizeof(). I have the feeling I've forgotten something simple.
Dave Cross
-
std::set is normally implemented with balanced binary search trees (AVL or RB). That gives you O(log(N)) for deletion, insertion and lookup of single nodes. Memory allocation is O(N). If you need to measure the memory overhead introduced by any STL container, you can always create your own allocator which just forwards calls to new and delete. In the same call, you can log the memory requested, and compare it against memory needed for your data.
-- In Hypno-Vision
Thanks for the clue, I traced the .insert() operation and found it made two calls to new() for 33 & 72 bytes when inserting my class ( int and char[50], not Unicode). That looks to me like an overhead of 105-54=51 bytes per entry. std::list required 97 bytes per entry but std::vector only 56. I have to store about 10,000 records and 50k seems like too big a price to pay for the convenience of std::set. Do these results seem reasonable to you?
Dave Cross
-
Thanks for the clue, I traced the .insert() operation and found it made two calls to new() for 33 & 72 bytes when inserting my class ( int and char[50], not Unicode). That looks to me like an overhead of 105-54=51 bytes per entry. std::list required 97 bytes per entry but std::vector only 56. I have to store about 10,000 records and 50k seems like too big a price to pay for the convenience of std::set. Do these results seem reasonable to you?
Dave Cross
int + char[50] should be 56 bytes with default alignment and packing taken into account (4 byte boundary). That makes the vector allocation sound resonable. How many inserts did you make? Do at least 100 inserts, as the various containers may need house keeping information, which doesn't grow with the number of items. I don't see why std::set should keep ~50 extra bytes per element! Most implementations are RB-trees, generally requiring 12 bytes (no packing) per element (color + child pointers). Also do the measurements in release mode, as I know the STL does store extra debug information.
-- Mit viel Oktan und frei von Blei, eine Kraftstoff wie Benziiiiiiin!
-
Thanks for that. I have the book and I know the relative merits of the containers. I need the functionality of std::set but I could code around the limitations of vector or deque if it was worth while. My question is really how to tell, in a given situation, exactly how much extra memory std::set uses compared with, say, std::vector? I can knock up a little test program to dynamically allocate 10,000 items using each method but how do I tell how much space is allocated to the whole collection? I need something like run-time_sizeof(). I have the feeling I've forgotten something simple.
Dave Cross
No, I don't think you have forgotten anything simple. It's difficult to say what memory usage is going to be without looking at the implementation :( However, if it's VC 7.1 you're using, then it looks like you have the following memory usage (this is from a running program built for Win32 Release): The
std::set
object is 12 bytes. The VC7.1std::set
implementation is a red-black tree. Each node of the tree is 16 bytes plus the size of the object you've stored. In my case, I've stored pointers, so each node is 20 bytes. Ouch! So total usage of a VC7.1std::Set
is 12 + n(16 + s), where n is the number of items stored and s is the size of each stored item. Now, I'd guess moststd::set
implementations will have a similar level of overhead. So....I'm guessing you don't want to usestd::set
if you're tight on memory....Last modified: 4mins after originally posted --
-
No, I don't think you have forgotten anything simple. It's difficult to say what memory usage is going to be without looking at the implementation :( However, if it's VC 7.1 you're using, then it looks like you have the following memory usage (this is from a running program built for Win32 Release): The
std::set
object is 12 bytes. The VC7.1std::set
implementation is a red-black tree. Each node of the tree is 16 bytes plus the size of the object you've stored. In my case, I've stored pointers, so each node is 20 bytes. Ouch! So total usage of a VC7.1std::Set
is 12 + n(16 + s), where n is the number of items stored and s is the size of each stored item. Now, I'd guess moststd::set
implementations will have a similar level of overhead. So....I'm guessing you don't want to usestd::set
if you're tight on memory....Last modified: 4mins after originally posted --
Stuart Dootson wrote:
Now, I'd guess most std::set implementations will have a similar level of overhead. So....I'm guessing you don't want to use std::set if you're tight on memory
That is correct, at a minimum. Since each node in the set must have 2 pointers (to the left and right children of the node) at the very least (and some implementations also carry pointers to the parent node as well). If you are trying to keep your memory footprint down,
vector
is the way to go (assuming you have a good estimate on how many elements it will contain and can reserve that memory at some point). That said,deque
will be more memory friendly if you have a lot of elements. The reasoning is fairly simple: if you estimate 10,000 elements for your vector, but end up addign 10,001, vector will try to reallocate to 15,000 elements and copy over the 10,001. With such a large memory footprint, for a few moments in CPU time, you will have more than double the required memory being allocated just to add a single element. In the same scenario, deque will just allocate another large chuck of data somewhere and add your single element in that chunk.set
isn't very memory friendly. It is designed to contain unique elements that are pre-sorted for you, making it more efficient for certain types of operations, but definitely not on memory usage.If you decide to become a software engineer, you are signing up to have a 1/2" piece of silicon tell you exactly how stupid you really are for 8 hours a day, 5 days a week Zac