Dividing groups of objects into sets

supercat9

I have a few problems of the same general format, and was wondering if there are any good algorithms; it's not necessary to get the optimal solution--just a decent one. I have about 100 bitmaps that are 24 rows by 80 bits. All of the bitmaps will probably have a total of around 400-450 different rows. I need to allocate the different rows into four sets of 128 different rows, such that each bitmap may be represented entirely with rows from one set. It will almost certainly be necessary to include some rows in more than one set, but this must not be done excessively. I am well aware that such a data format (storing each bitmap as a byte indicating which set to use, and then 24 bytes indicating which rows to use within that set) is not the most efficient from a space perspective, but the data is for use on a rather slow processor which can't afford to waste any time processing it. Are there any common algorithms which could be readily adapted to the purpose, or should I try to cobble together some heuristics and see what happens?

Alan Balkany

There are many possible approaches, and it's possible there's no solution for a given set of bitmaps. The problem is in partitioning the bitmaps into the four sets, which suggests a kind of cluster analysis may be helpful. 1. Make a 100 x 100 integer array with rows and columns representing the bitmaps. 2. For row r, and column c, set the value of the cell to the number of rows bitmap r has in common with bitmap c. 3. Go through the array in descending cell order, i.e. start with the highest cell value, and link the two bitmaps corresponding to this cell, to form a cluster. Note that if both bitmaps are already in clusters, this operation will combine the clusters into a bigger cluster. But DON'T do the combination if the total number of unique rows from the two clusters is greater than 128 (since it wouldn't fit into one set.) 4. When you can no longer make any more clusters, assign the four largest clusters to the four sets. 5. Assign the remaining bitmaps to the four sets, picking the set that already has the most rows in common with the bitmap being assigned. This will cluster the bitmaps that have the most rows in common, making efficient use of your four sets. While not optimal, this will give you a decent solution.

supercat9

I'll have to give that a try. There are a few similar problems I have, with somewhat different parameters; the approach may work well if there are 100 groups of things, but may not work so well in a variation with 1,000 groups. Even there, though, it might not be too totally horrible if it's possible to quickly identify how many items are shared by two groups (doing 1,000,000 such match-ups might not be too horrible if they took a microsecond each, but would be annoying if they took a millisecond). I'd been debating whether it would be better to grow clusters separately, or whether it would be better to pick an item to start a cluster, and then grow that cluster until things don't fit any more, then start another cluster, etc. I suppose the best approach may depend upon the exact data to be processed, but the data won't exist until after the compressor is done. Basically, I'm trying to compress some level-design data for use on a very old video game system (Atari 2600, running off a 1.19MHz 6502 with 128 bytes of RAM and 8x4k banks of ROM). The goal will be to take a level design and process it into the format necessary for the game. I can't know what sorts of levels will be designed until I get some idea of what's likely to fit. It would be possible to accommodate 20 rooms of arbitrary data, but many rooms are going to have rows in common, so in practice it should be able to accommodate a lot more than 20 rooms. I don't know how much more, however.