If you do buddy with a set of freelist heads, one for each size, and your buddy combiner orders the freelist, you have an extra benefit of locality: most accesses would go to the lower end of the heap, making better use of virtual memory (less paging). A background GC could unhook a freelist (maybe leaving a couple entries in the list for use while the GC was working), returning with one shorter list for the original freelist and one list of combined buddies to be put into the next higher size freelist. The head end of the freelist may be rather unordered - this is where all the allocation and freeing is taking place. If the list is long - it hasn't been emptied for quite some time - the tail end may be perfectly sorted after the previous GC/combination round. If you do sorting e.g. by Smoothsort, handling the already sorted part has complexity O(n), so most likely, the long freelist will not required much effort. You find buddies by traversing a sorted list, so the list of buddy pairs will also be sorted. If the next higher freelist is also mostly sorted, all buddy pairs is inserted into this is list in a single traversal. I would do real timing tests with a synthetic heap load (modeled after a relevant usage scenario) to see if it really is worthwhile the resource cost of an asynchronous GC thread - strongly suspecting that a finely tuned incremental but synchronous buddy manager can do it both at a lower total resource cost and with so small delays that it would be a much better solution. Final remark: "you've finished doing useful work and you don't need to make the user wait for the collection". In most systems, each process has its own heap. Multiple processes allocating from one common global heap requires a lot of resource consuming synchronization. Most CLI programs are run in their own processes. So when they complete, noone cares about what their heap looks like at that time. There is no reason to do any garbage collection at that time. The entire data segment holding the heap is released en bloc. In an embedded system, you often have a single systemwide heap. But few embedded system have CLI interfaces for running arbitrary programs that start up and terminate as a function of user operations. Even if the embedded system has some sort of UI, user actions are usually limited to activating specific built-in operations in the embedded code, not separate CLI oriented programs. But of course, there may be exceptions :-)
Religious freedom is the freedom to say th