Skip to content
  • Categories
  • Recent
  • Tags
  • Popular
  • World
  • Users
  • Groups
Skins
  • Light
  • Cerulean
  • Cosmo
  • Flatly
  • Journal
  • Litera
  • Lumen
  • Lux
  • Materia
  • Minty
  • Morph
  • Pulse
  • Sandstone
  • Simplex
  • Sketchy
  • Spacelab
  • United
  • Yeti
  • Zephyr
  • Dark
  • Cyborg
  • Darkly
  • Quartz
  • Slate
  • Solar
  • Superhero
  • Vapor

  • Default (No Skin)
  • No Skin
Collapse
Code Project
  1. Home
  2. General Programming
  3. ATL / WTL / STL
  4. How memory efficient is std::set?

How memory efficient is std::set?

Scheduled Pinned Locked Moved ATL / WTL / STL
data-structuresperformancetutorialquestion
11 Posts 4 Posters 0 Views 1 Watching
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • D Dave Cross

    From the 'I feel I ought to know the answer but I don't' dept. I need to create a large array of UDT's and I'm short on space. Using std::set would be a no-brainer if I could work out (and justify) how much extra space it'll need at run-time compared with a hand crafted array. I realise I don't even know how to measure it at run time! Any hints and tips would be welcome.:confused:

    Dave Cross

    Z Offline
    Z Offline
    Zac Howland
    wrote on last edited by
    #2

    That all depends on the type of efficiency you are looking for. If you want memory efficiency, vector or deque is better (deque being better than vector in that case as well). Also, if you need random access capability, you should stick with one of those two collections. If you want fast sorting efficiency, set is a good choice since it sorts elements as they are inserted. Keep in mind that set also requires elements to be unique, so if you need to have elements that might compare to be the same (e.g. a set of integers with 2 5's), you will need to use multiset or another collection like vector, deque or list. Basically, it depends on what you are doing, what type of data you have, and how you intend to use that data.

    If you decide to become a software engineer, you are signing up to have a 1/2" piece of silicon tell you exactly how stupid you really are for 8 hours a day, 5 days a week Zac

    1 Reply Last reply
    0
    • D Dave Cross

      From the 'I feel I ought to know the answer but I don't' dept. I need to create a large array of UDT's and I'm short on space. Using std::set would be a no-brainer if I could work out (and justify) how much extra space it'll need at run-time compared with a hand crafted array. I realise I don't even know how to measure it at run time! Any hints and tips would be welcome.:confused:

      Dave Cross

      J Offline
      J Offline
      Jorgen Sigvardsson
      wrote on last edited by
      #3

      std::set is normally implemented with balanced binary search trees (AVL or RB). That gives you O(log(N)) for deletion, insertion and lookup of single nodes. Memory allocation is O(N). If you need to measure the memory overhead introduced by any STL container, you can always create your own allocator which just forwards calls to new and delete. In the same call, you can log the memory requested, and compare it against memory needed for your data.

      -- In Hypno-Vision

      D 1 Reply Last reply
      0
      • D Dave Cross

        From the 'I feel I ought to know the answer but I don't' dept. I need to create a large array of UDT's and I'm short on space. Using std::set would be a no-brainer if I could work out (and justify) how much extra space it'll need at run-time compared with a hand crafted array. I realise I don't even know how to measure it at run time! Any hints and tips would be welcome.:confused:

        Dave Cross

        J Offline
        J Offline
        Jorgen Sigvardsson
        wrote on last edited by
        #4

        Here[^] is a good example on how to implement your own STL-compliant container. It'll show you how to use allocators. :)

        -- Please rise for the Futurama theme song

        1 Reply Last reply
        0
        • D Dave Cross

          From the 'I feel I ought to know the answer but I don't' dept. I need to create a large array of UDT's and I'm short on space. Using std::set would be a no-brainer if I could work out (and justify) how much extra space it'll need at run-time compared with a hand crafted array. I realise I don't even know how to measure it at run time! Any hints and tips would be welcome.:confused:

          Dave Cross

          S Offline
          S Offline
          Stuart Dootson
          wrote on last edited by
          #5

          If you can allocate it all at once, std::vector is the most memory efficient - it allocates a single contiguous block of memory and will have something like 12-16 bytes overhead on top (pointer to allocated memory, size of allocated block, use count of allocated block). If the vector has to reallocate, though, you will have high memory usage while the vector allocates a new (larger) contiguous block and copies elements to it. std::deque is second in terms of overhead. It allocates memory in large-ish chunks, so if you need more elements, it just allocates a new chunk. Obviously, it needs to keep track of each chunk, so you're likely to have 12-16 bytes overhead per-chunk, possibly (I'm guessing here!). std::set - mmmm, you don't really want to use that unless you need to utilise its particular properties, i.e. it's an ordered set and it's pretty efficient for checking for the existence of a particular element (given the ordering function you specify). Even then - it's probably more performant to sort a std::vector and one of the algorithms (std::lower_bound to get an iterator somewhere near the element, or std::binary_search to show the existence (but not the location!) of the element) BTW - you may find Effective STL by Scott Meyers[^] useful for STL related hints and tips - tip number 1 is "Choose your containers with care." - hmmm, sounds relevant :-) HTH!

          J D 2 Replies Last reply
          0
          • S Stuart Dootson

            If you can allocate it all at once, std::vector is the most memory efficient - it allocates a single contiguous block of memory and will have something like 12-16 bytes overhead on top (pointer to allocated memory, size of allocated block, use count of allocated block). If the vector has to reallocate, though, you will have high memory usage while the vector allocates a new (larger) contiguous block and copies elements to it. std::deque is second in terms of overhead. It allocates memory in large-ish chunks, so if you need more elements, it just allocates a new chunk. Obviously, it needs to keep track of each chunk, so you're likely to have 12-16 bytes overhead per-chunk, possibly (I'm guessing here!). std::set - mmmm, you don't really want to use that unless you need to utilise its particular properties, i.e. it's an ordered set and it's pretty efficient for checking for the existence of a particular element (given the ordering function you specify). Even then - it's probably more performant to sort a std::vector and one of the algorithms (std::lower_bound to get an iterator somewhere near the element, or std::binary_search to show the existence (but not the location!) of the element) BTW - you may find Effective STL by Scott Meyers[^] useful for STL related hints and tips - tip number 1 is "Choose your containers with care." - hmmm, sounds relevant :-) HTH!

            J Offline
            J Offline
            Jorgen Sigvardsson
            wrote on last edited by
            #6

            Stuart Dootson wrote:

            Even then - it's probably more performant to sort a std::vector and one of the algorithms (std::lower_bound to get an iterator somewhere near the element, or std::binary_search to show the existence (but not the location!) of the element)

            Not if the vector changes frequently (size and values)!

            Stuart Dootson wrote:

            tip number 1 is "Choose your containers with care." - hmmm, sounds relevant

            I'd say it's relevant even at times when one think it's not. :-D

            -- Mit viel Oktan und frei von Blei, eine Kraftstoff wie Benziiiiiiin!

            1 Reply Last reply
            0
            • S Stuart Dootson

              If you can allocate it all at once, std::vector is the most memory efficient - it allocates a single contiguous block of memory and will have something like 12-16 bytes overhead on top (pointer to allocated memory, size of allocated block, use count of allocated block). If the vector has to reallocate, though, you will have high memory usage while the vector allocates a new (larger) contiguous block and copies elements to it. std::deque is second in terms of overhead. It allocates memory in large-ish chunks, so if you need more elements, it just allocates a new chunk. Obviously, it needs to keep track of each chunk, so you're likely to have 12-16 bytes overhead per-chunk, possibly (I'm guessing here!). std::set - mmmm, you don't really want to use that unless you need to utilise its particular properties, i.e. it's an ordered set and it's pretty efficient for checking for the existence of a particular element (given the ordering function you specify). Even then - it's probably more performant to sort a std::vector and one of the algorithms (std::lower_bound to get an iterator somewhere near the element, or std::binary_search to show the existence (but not the location!) of the element) BTW - you may find Effective STL by Scott Meyers[^] useful for STL related hints and tips - tip number 1 is "Choose your containers with care." - hmmm, sounds relevant :-) HTH!

              D Offline
              D Offline
              Dave Cross
              wrote on last edited by
              #7

              Thanks for that. I have the book and I know the relative merits of the containers. I need the functionality of std::set but I could code around the limitations of vector or deque if it was worth while. My question is really how to tell, in a given situation, exactly how much extra memory std::set uses compared with, say, std::vector? I can knock up a little test program to dynamically allocate 10,000 items using each method but how do I tell how much space is allocated to the whole collection? I need something like run-time_sizeof(). I have the feeling I've forgotten something simple.

              Dave Cross

              S 1 Reply Last reply
              0
              • J Jorgen Sigvardsson

                std::set is normally implemented with balanced binary search trees (AVL or RB). That gives you O(log(N)) for deletion, insertion and lookup of single nodes. Memory allocation is O(N). If you need to measure the memory overhead introduced by any STL container, you can always create your own allocator which just forwards calls to new and delete. In the same call, you can log the memory requested, and compare it against memory needed for your data.

                -- In Hypno-Vision

                D Offline
                D Offline
                Dave Cross
                wrote on last edited by
                #8

                Thanks for the clue, I traced the .insert() operation and found it made two calls to new() for 33 & 72 bytes when inserting my class ( int and char[50], not Unicode). That looks to me like an overhead of 105-54=51 bytes per entry. std::list required 97 bytes per entry but std::vector only 56. I have to store about 10,000 records and 50k seems like too big a price to pay for the convenience of std::set. Do these results seem reasonable to you?

                Dave Cross

                J 1 Reply Last reply
                0
                • D Dave Cross

                  Thanks for the clue, I traced the .insert() operation and found it made two calls to new() for 33 & 72 bytes when inserting my class ( int and char[50], not Unicode). That looks to me like an overhead of 105-54=51 bytes per entry. std::list required 97 bytes per entry but std::vector only 56. I have to store about 10,000 records and 50k seems like too big a price to pay for the convenience of std::set. Do these results seem reasonable to you?

                  Dave Cross

                  J Offline
                  J Offline
                  Jorgen Sigvardsson
                  wrote on last edited by
                  #9

                  int + char[50] should be 56 bytes with default alignment and packing taken into account (4 byte boundary). That makes the vector allocation sound resonable. How many inserts did you make? Do at least 100 inserts, as the various containers may need house keeping information, which doesn't grow with the number of items. I don't see why std::set should keep ~50 extra bytes per element! Most implementations are RB-trees, generally requiring 12 bytes (no packing) per element (color + child pointers). Also do the measurements in release mode, as I know the STL does store extra debug information.

                  -- Mit viel Oktan und frei von Blei, eine Kraftstoff wie Benziiiiiiin!

                  1 Reply Last reply
                  0
                  • D Dave Cross

                    Thanks for that. I have the book and I know the relative merits of the containers. I need the functionality of std::set but I could code around the limitations of vector or deque if it was worth while. My question is really how to tell, in a given situation, exactly how much extra memory std::set uses compared with, say, std::vector? I can knock up a little test program to dynamically allocate 10,000 items using each method but how do I tell how much space is allocated to the whole collection? I need something like run-time_sizeof(). I have the feeling I've forgotten something simple.

                    Dave Cross

                    S Offline
                    S Offline
                    Stuart Dootson
                    wrote on last edited by
                    #10

                    No, I don't think you have forgotten anything simple. It's difficult to say what memory usage is going to be without looking at the implementation :( However, if it's VC 7.1 you're using, then it looks like you have the following memory usage (this is from a running program built for Win32 Release): The std::set object is 12 bytes. The VC7.1 std::set implementation is a red-black tree. Each node of the tree is 16 bytes plus the size of the object you've stored. In my case, I've stored pointers, so each node is 20 bytes. Ouch! So total usage of a VC7.1 std::Set is 12 + n(16 + s), where n is the number of items stored and s is the size of each stored item. Now, I'd guess most std::set implementations will have a similar level of overhead. So....I'm guessing you don't want to use std::set if you're tight on memory....

                    Last modified: 4mins after originally posted --

                    Z 1 Reply Last reply
                    0
                    • S Stuart Dootson

                      No, I don't think you have forgotten anything simple. It's difficult to say what memory usage is going to be without looking at the implementation :( However, if it's VC 7.1 you're using, then it looks like you have the following memory usage (this is from a running program built for Win32 Release): The std::set object is 12 bytes. The VC7.1 std::set implementation is a red-black tree. Each node of the tree is 16 bytes plus the size of the object you've stored. In my case, I've stored pointers, so each node is 20 bytes. Ouch! So total usage of a VC7.1 std::Set is 12 + n(16 + s), where n is the number of items stored and s is the size of each stored item. Now, I'd guess most std::set implementations will have a similar level of overhead. So....I'm guessing you don't want to use std::set if you're tight on memory....

                      Last modified: 4mins after originally posted --

                      Z Offline
                      Z Offline
                      Zac Howland
                      wrote on last edited by
                      #11

                      Stuart Dootson wrote:

                      Now, I'd guess most std::set implementations will have a similar level of overhead. So....I'm guessing you don't want to use std::set if you're tight on memory

                      That is correct, at a minimum. Since each node in the set must have 2 pointers (to the left and right children of the node) at the very least (and some implementations also carry pointers to the parent node as well). If you are trying to keep your memory footprint down, vector is the way to go (assuming you have a good estimate on how many elements it will contain and can reserve that memory at some point). That said, deque will be more memory friendly if you have a lot of elements. The reasoning is fairly simple: if you estimate 10,000 elements for your vector, but end up addign 10,001, vector will try to reallocate to 15,000 elements and copy over the 10,001. With such a large memory footprint, for a few moments in CPU time, you will have more than double the required memory being allocated just to add a single element. In the same scenario, deque will just allocate another large chuck of data somewhere and add your single element in that chunk. set isn't very memory friendly. It is designed to contain unique elements that are pre-sorted for you, making it more efficient for certain types of operations, but definitely not on memory usage.

                      If you decide to become a software engineer, you are signing up to have a 1/2" piece of silicon tell you exactly how stupid you really are for 8 hours a day, 5 days a week Zac

                      1 Reply Last reply
                      0
                      Reply
                      • Reply as topic
                      Log in to reply
                      • Oldest to Newest
                      • Newest to Oldest
                      • Most Votes


                      • Login

                      • Don't have an account? Register

                      • Login or register to search.
                      • First post
                        Last post
                      0
                      • Categories
                      • Recent
                      • Tags
                      • Popular
                      • World
                      • Users
                      • Groups