Records and Clusters [modified]

DQNOK

For many varied reasons, I decided to write my own balanced binary tree library. One feature of my library that I haven't seen in other freely available libraries is "clusters", and a "recluster" function. The problem that I am attacking is that as elements are randomly added to the tree, the tree rotations that automatically happen to keep the tree balanced tend to spread the edge pointers all over the place. The end result is that in a large tree, a typical traversal can require reading nodes from all over physical memory (or disk pages): a very inefficient use of cache. By clustering several subtree "near the root nodes" together in physical memory, tree traversals can be sped-up significantly by keeping nodes that always get visited near one another within physical memory (or on the same disk page). For example, in a properly clustered tree, with clusters large enough to hold 4 levels of records (1+2+4+8 = 15 records), a tree of height 16 (>45000 records) can be traversed in only 4 cluster reads instead of 16 reads that might be required of a non-clustered tree. What I don't know is a good way to measure how "out-of-cluster" a tree is. I'd like to be able to determine when a tree needs to be reclustered. I'm sure there is good theory out there for this; and the idea doesn't seem to be that difficult, but I haven't run onto the solution yet. I've spent a little bit of time googling for a method, and probably a couple of hours thinking about it. Nothing yet. Any ideas or helpful links?

David --------- Empirical studies indicate that 20% of the people drink 80% of the beer. With C++ developers, the rule is that 80% of the developers understand at most 20% of the language. It is not the same 20% for different people, so don't count on them to understand each other's code. http://yosefk.com/c++fqa/picture.html#fqa-6.6 ---------

modified on Wednesday, November 5, 2008 11:33 AM

Member 4194593 · modified on Wednesday, November 5, 2008 11:33 AM

Have you ever considered B+ trees as implemented by "The C Database Tool Chest", Mix Software. Mix is still on the web - Google "Mix Software". Note that the term "database" is, in fact, just a B+ tree. The database is a tree of memory or file blocks which contain multiple keys and records. I got this in '85 and have used it ever since. I got the source software (don't know if this is still available). Not Open Source, but any programs that are released can be used without any royalty, only restriction is that it must be linked into an object, no DLLs, and the neither the source nor the library can be released - only the executable with the embedded code. With the source, you can make any modifications you want. I modified it (originally 16bit - used Borland C to compile) to now use VS2008 thus 32 bit and supports >4GB files. Major rewrite to unwind the INT usage which now is 32 bit - in 16 bit an INT was 16 bit. Many conflicts with structure layout, and old 16 bit databases couldn't be read with the 32 bit version, nor the other way around. I built my version where I could declare which size of INT I wanted so I had only one converter to work with. I created two converters (16 bit and 32 bit) which could read the appropriate data base (in ascending order) and output a flat file with the data it contained, and then read in the flat file and write the other bit sized database. Dave.

DQNOK

Member 4194593 wrote:

Have you ever considered B+ trees as implemented by "The C Database Tool Chest", Mix Software.

No, I hadn't; but I am familiar with Mix, and have used both their Power C compiler and their C utilities toolchest. I'll have to back-up and reconsider whether this would be a good approach to my end goal (which was NOT to rewrite the balanced binary tree algorithms). I originally rolled my own balanced binary tree routines because the ones I found on the web didn't support duplicate keys, nor multi-threading. The clusters were an after-thought (actually, they came about from a "Hey, I can use the tree structure instead of yet-another-roll-my-own exotic structure -- IF I can speed them up sufficiently -- perhaps reclustering would do it). Thanks for the thoughts.

David --------- Empirical studies indicate that 20% of the people drink 80% of the beer. With C++ developers, the rule is that 80% of the developers understand at most 20% of the language. It is not the same 20% for different people, so don't count on them to understand each other's code. http://yosefk.com/c++fqa/picture.html#fqa-6.6 ---------

Member 4194593

David, Good that you like David. You be David, I be Dave. Good name. The blocks in the B+ tree do not exactly correspond to the clusters you use, but do tend to group records together. The interesting thing is that the tree is self leveling. All blocks are identified by a block number. All of the full records are on the bottom level of the tree, linked horizontally by their block numbers, and have an array of offsets in each block to the records, thus constitute an "array", making it easy to go from one record to another. Above the "leaf" nodes at the bottom is an interior node or a layer of interior nodes which only contain the keys (not the additional data) and each key is associated with the block number containing the child nodes below. The first block (block 0) is a header block containing linking info (first leaf block, last leaf block, i.e. first record and last record), and the current root node. Whenever a node (either a leaf node or an interior node)) fills up (not when it is found to be full when a new record needs to be added, but whenever it becomes full due to the addition of a record), it attempts to offload some records to the prior same level block or following same level block, and if that is not possible, the block is split with half of the records in one and the second half in the other and these two blocks are linked. This also creates a new record for the parent block pointing to the block that just split, which may also cause the parent to split, on up the tree until maybe the root block needs to split thus creating a new root and updating the pointer of the header block. The blocks are allocated in memory with a buffer pool which is maintained with an LRU list, and when the buffer pool is full and a new block is needed, the LRU block is written (if dirty) to the database file, randomly by block number, or just discarded and reused if not dirty. Blocks can thus be either in-memory, or else on-file. Blocks are not written to file unless they hit LRU status and a new block is needed, or at the end, when the tree is closed. Multiple trees can be opened at one time (256 of then sharing the buffer pool). I build an application that needed to create a massive tree, then discard it and process other records comprised from the record numbers only. This caused the tree to be written out, then discarded. I wrote a new function to purge a tree without writing by just modifying the header record to indicate that it was empty, writing that out and closing the tree, then returning all buffer pool re

DQNOK

This discussion has sent me looking into B+trees. Some of the little code I've found looks ridiculously simple (by comparison to my BalBinaryTree implementation). I have no idea of speed comparisons, but it makes sense that they would be fast or else RDBMS's wouldn't use them. Plus it seems they might actually be smaller since every record in my tree system carries 12 bytes of overhead. Yes, the mix source is still available, but it sounds like the right source to have is Mix's (which of course requires a small $ outlay) plus your modifications. My (junk) e-mail address is davidlqualls@yahoo.com I don't check it every day; and it fills up with garbage pretty fast, but if you'll put "B+Tree" in the subject line, I'll be looking for it. Thanks. David

Member 4194593

DQNOK wrote:

which of course requires a small $ outlay

The operative word here is "small" not like VS, and the library an be included in an executable royalty free. Dave.

Member 4194593

David, Sorry this has taken so long to answer. It got buried on my email with Christmas correspondence, and you never responded to this thread again, and I got off on 15 other tangents. Did you ever get the Mix source? I can't legally send you any "fixed" source unless you have the original source. How is your balanced tree going? Dave.

DQNOK

Thanks for remembering me. I looked into it and decided you were right; using the binary tree was the wrong approach. Binary trees just do too much memory thrashing. I also got off on a really hot project and had to set it aside for a while. I downloaded a couple of free BTree libraries, but I'm never happy with stuff I get from outside. They will have been developed with their own support libraries, which usually conflict with mine, or they are missing features I want, or are bloated with features I don't want, etc. I really just wanted an in-memory index, without the burden of a disk-based system. However, I really should consider a product that has disk-based storage in case I want to expand later on. I'm still interested, just haven't taken "the plunge" yet. It's not the money; it's the time committment in learning a new system and adapting all my stuff to their way of doing things. I'm just responding to your post without having gone back and reviewed everything that was said to this point. I'm going to do that, plus, if you have anything else new to add, I'd be glad to hear it. Perhaps I should just go ahead and buy the source...

David --------- Empirical studies indicate that 20% of the people drink 80% of the beer. With C++ developers, the rule is that 80% of the developers understand at most 20% of the language. It is not the same 20% for different people, so don't count on them to understand each other's code. http://yosefk.com/c++fqa/picture.html#fqa-6.6 ---------

DQNOK

A second response: Have you noticed that Mix now sells a Win32 version of their database toolchest? I don't suppose you've compared their implementation against yours, have you? I know nothing except what they've posted on their web site.

David

Member 4194593

David, It would be interesting to see how they converted it, however, I have already done that and it works great and I don't need to spend the money to see what they did. The primary fix was to get rid of all of the int's and make them CBT defined types, each one different so you knew what you were dealing with, then define these types as either Signed/Unsigned Longs or Signed/Unsigned Shorts depending on whether you wanted a version to be able to read an old database. If you do get the source, I can give you my fixes for a PurgeTree to quickly kill a tree that is no longer needed. Dave.