Cubesort

Igor van den Hoven

I've created a new sorting algorithm which allows O(1) sorting of ordered data, O(log (root n)) sorting of partially sorted data, and O(log n) sorting of random data. The underlying mechanism is well suited for a general purpose data structure, which I'm calling a binary search cube, and has notable advantages over binary trees. The data structure is cube-shaped rather than tree-shaped. I've described the data structure and sorting algorithm on the following webpage: https://sites.google.com/site/binarysearchcube/ The site also includes three programs written in C, one of them implementing cubesort using a 4 dimensional search cube, and an implementation of a 3 and 4 dimensional binary search cube. Haven't been able to properly benchmark the software against other sorting algorithms and data structures, but the memory overhead is significantly less. Hoping for some interesting feedback.

Sanmayce

Hi Mr. Hoven, looks nice feels exciting, surely the showdowns are gonna be must-see. My suggestion is to make a simple console tool to ease the benchmarking, I will be the first to run some superheavy English phraselists/wordlists (up to 400,000,000 lines) on my laptop T7500. One very useful filedataset is to sort Knight-Tours (64x2=128 bytes long lines), because you can generate even billions of them in reasonable time, my package (that do that) can be downloaded freely at: www.sanmayce.com/Downloads/Sandokan_r3_vs_Windows-sort.zip[^] My old console sorter Sandokan outperformed (only for internal sorting) the built-in Windows 7 64bit sort.com:

E:\Sandokan_r3-+++\Sandokan_r3_vs_Windows-sort>GENERATE_3million_Knight-Tours_and_SORT_them.bat

Microsoft Windows [Version 6.1.7601]
Generating 3,000,000 Knight-Tours and dumping them into file 390,000,000 bytes ...

E:\Sandokan_r3-+++\Sandokan_r3_vs_Windows-sort>Knight-tour_r8dump_Microsoft_V16_32bit_Ox.exe a8 3000000 1>KT3million.txt
Sorting these 3,000,000 Knight-Tours ...

E:\Sandokan_r3-+++\Sandokan_r3_vs_Windows-sort>"Sandokan_QuickSortExternal_4+GB_32bit_Microsoft.exe" KT3million.txt /fast /descend 768
Sandokan_QuickSortExternal_4+GB r.3, written by Kaze, using Bill Durango's Quicksort source.
Size of input file: 390,000,000
Counting lines ...
Lines encountered: 3,000,000
Longest line (including CR if present): 129
Allocated memory for pointers-to-lines in MB: 22
Assigning pointers ...
sizeof(int), sizeof(void*): 4, 4
Trying to allocate memory for the file itself in MB: 371 ... OK! Get on with fast internal accesses.
Uploading ...
Sorting 3,000,000 Pointers ...
Quicksort (Insertionsort for small blocks) commenced ...
| RightEnd: 000,001,500,225; NumberOfSplittings: 0,000,335,016; Done: 100% ...
NumberOfComparisons: 71,052,662
The time to sort 3,000,000 items via Quicksort+Insertionsort was 21,029 clocks.
Dumping the sorted data ...

Done 100% ...
Dumped 3,000,000 lines.
OK! Incoming and resultant file's sizes match.
Dump time: 1,560 clocks.
Total time: 27,144 clocks.
Performance: 14,366 bytes/clock.
Done successfully.

E:\Sandokan_r3-+++\Sandokan_r3_vs_Windows-sort>sha1sum_Microsoft_V16_32bit_Ox.exe "QuickSortExternal_4+GB.txt"
a053faa74ffb6cad41e61e273a1a0e0049cb25e7 QuickSortExternal_4+GB.txt

Igor van den Hoven

In my own tests cubesort is 2 times slower than mergesort for random integers and 2.5 times faster for sorted integers. This is using the latest version which I uploaded today and improves performance by about 25%. I'm not sure if this gap can be closed as mergesort has superior cache performance for random data. I haven't been able to find a decent quicksort implementation. Cubesort seems best suited for cases where a data set is for more than 50% in order. When I have a couple of hours I'll make a string based version of cubesort (very easy) and see how fast it sorts the file. Does the file contain duplicates? Edit: It appears the file is in reverse order. Takes about 3.5 seconds to load the file, 5 seconds to sort it using cubesort.

Sanmayce

Having benchmarked Cubesort with Knight-Tours my amateurish evaluation is of two words: Mutsi! Mutsunka! My Sandokan (r3fix) is inferior compared to Cubesort, 10,000,000 lines (128 bytes each) were sorted and after that sorted file was sorted once more:

------------------------------------------------------------------------------------------
| sorter \ filedataset | KT10000000.txt | KT10000000_SORTED.txt |

| Cubesort | 080 seconds | 045 seconds |
| Windows' sort.exe | 118 seconds | 115 seconds |
| Sandokan | 131 seconds; swappings: 36,826,242+ | 114 seconds; swappings: 0 |

The benchmark is downloadable at: www.sanmayce.com/Downloads/TriShowdown_Sandokan_vs_Windows_vs_Cubesort_Intel_64bit.zip[^] The full log follows:

E:\TriShowdown_Sandokan_vs_Windows_vs_Cubesort_Intel_64bit>dir
Volume in drive E is SSD_Sanmayce
Volume Serial Number is 9CF6-FEA3

Directory of E:\TriShowdown_Sandokan_vs_Windows_vs_Cubesort_Intel_64bit

06/29/2014 11:45 PM

..

06/29/2014 11:45 PM 12,566 cubesort-1.0_non-original.c
06/29/2014 11:45 PM 86,016 cubesort-1.0_non-original.exe
06/29/2014 11:45 PM 202,880 Cubesort.zip
06/29/2014 11:45 PM 24,490 Knight-tour_r8dump.c
06/29/2014 11:45 PM 79,872 Knight-tour_r8dump.exe
06/29/2014 11:45 PM 116 Make_EXEs.bat
06/29/2014 11:45 PM 1,604 MokujIN prompt.lnk
06/29/2014 11:45 PM 301,406 Sandokan_Logo.pdf
06/29/2014 11:45 PM 170,433 Sandokan_QuickSortExternal_4+GB_r3fix.c
06/29/2014 11:45 PM 122,368 Sandokan_QuickSortExternal_4+GB_r3fix.exe
06/29/2014 11:45 PM 6,144 timer64.exe
06/29/2014 11:45 PM 1,111 TriShowdown_Sandokan_vs_Windows_vs_Cubesort.bat
12 File(s) 1,009,006 bytes
2 Dir(s) 27,537,633,280 bytes free

E:\TriShowdown_Sandokan

Sanmayce

Cubesort is awesome, with big potential on top of that. The thing that impresses me most is its construction speed, not to mention the 'free' dump. Couldn't resist not to try my 'secret weapon' - a frozen/unfinished sub-project called 'Tcheburaschkasort'. It is entirely based on the titaniumESQUEly strong Bayer-McCreight algorithm which I implemented specifically and only as order 3 (three children maximum). It's been some time since I wrote it, enough to forget this-and-that, but the initial idea was to use next dimension i.e. to use millions of B-trees. My unfinished Tcheburaschkasort currently uses 24bit pseudo-hash (in fact first 1|2|3 byte(s) ARE a slot) giving a forest of 16,777,216 B-Trees of order 3:

UINT PseudoHash(const char *str, unsigned int wrdlen)
{
UINT hash32 = 0;
const char *p = str;
// Use 256-based system for first 3 bytes i.e. left is more significant:
// Currently 3bytes long i.e. 24bit:
if (wrdlen>=1) {
hash32 = hash32 | ((*p)<<16);
p += 1;
}
if (wrdlen>=2) {
hash32 = hash32 | ((*p)<<8);
p += 1;
}
if (wrdlen>=3) {
hash32 = hash32 | ((*p)<<0);
p += 1;
}
return hash32;
}

Each slot houses the root of a tree. Not using any hash Tcheburaschkasort resembles a hypercube of order 0, using hash makes it 1D i.e. of order 1 - lookup hashtable is the axis. Blah-blah, in short the randomness of those prefixes (1..3bytes long) decides the speed performance. In example with KT10000000.txt all first 3 bytes are equal, 'A8C', so there is one only tree and performance is poor. Because all the 10,000,000 lines start from board position A8. Knight-tours: each square represents a board position, as A1(bottom-left), H8(top-right):

-----------------
|a8 |b8 |...|h8 |

|a7 |...| |

|...|...| |

|a1 |...|...|h1 |

When making the data more diverse, e.g. reverse the order of tours then for KT10000000.txt we will have 50 trees, see below. Let us see what boost comes from this diversity. So, Tcheburaschkasort vs Cubesort results for KT10M filedataset (10,000,000 Knight-Tours from A8, reversed): NOTE: Tcheburaschkasort is based on my superfast multi-purpose text ripper Leprechaun. Tcheburaschkasort still DOES NOT make in-order dump/traversal, resultant file is unsorted, to be done. This DOES NOT affect results, though. The full log is as follows:

E:\BiShowdown_Cubesort_Tcheburaschkasort_

Igor van den Hoven · 06/29/2014 11:45 PM

libdivsufsort implements a suffix array, which according to some sources is about as fast as quicksort. I made some improvements to the memory handling of Cubesort and it now is 1.21 times slower than mergesort for random data, and 5.75 times faster for sorted data. Might see further improvement gains by setting BSC_L to 1 when comparing strings.

Cubesort

------------------------------------------------------------------------------------------ | sorter \ filedataset | KT10000000.txt | KT10000000_SORTED.txt |

| Cubesort | 080 seconds | 045 seconds | | Windows' sort.exe | 118 seconds | 115 seconds | | Sandokan | 131 seconds; swappings: 36,826,242+ | 114 seconds; swappings: 0 |

----------------- |a8 |b8 |...|h8 |

|a7 |...| |

|...|...| |

|a1 |...|...|h1 |

------------------------------------------------------------------------------------------ | sorter \ filedataset | KT10000000.txt | KT10000000_SORTED.txt |

| Cubesort | 080 seconds | 045 seconds | | Windows' sort.exe | 118 seconds | 115 seconds | | Sandokan | 131 seconds; swappings: 36,826,242+ | 114 seconds; swappings: 0 |

------------------------------------------------------------------------------------------
| sorter \ filedataset | KT10000000.txt | KT10000000_SORTED.txt |

| Cubesort | 080 seconds | 045 seconds |
| Windows' sort.exe | 118 seconds | 115 seconds |
| Sandokan | 131 seconds; swappings: 36,826,242+ | 114 seconds; swappings: 0 |

-----------------
|a8 |b8 |...|h8 |

------------------------------------------------------------------------------------------
| sorter \ filedataset | KT10000000.txt | KT10000000_SORTED.txt |

| Cubesort | 080 seconds | 045 seconds |
| Windows' sort.exe | 118 seconds | 115 seconds |
| Sandokan | 131 seconds; swappings: 36,826,242+ | 114 seconds; swappings: 0 |