Need help to run AVX2 console benchmark
-
Whoever wants to run my superfast (maybe the fastest) LZSS decompression benchmark will help me a lot to learn more about supremacy of 512bit registers. Seeing how Haswell boasts 1TB/s L1 cache speeds made me curious how close to that amazing bandwidth one well-written memory etude can come. The benchmark package includes 2 executables, first compiled as 64bit using 64bit GP registers, second as 32bit using 512bit ZMM registers. The 807MB file included in the test is compressed down to 249MB (ZIP's maximum mode gives 77MB), the decompression speed is 956MB/s on my laptop with Core2 T7500. Given that my 'memcpy()' works at 1950MB/s and i7-4770K's at 13211MB/s, I expect on Haswell speeds exceeding 6x956MB/s (for one thread), is my estimation correct? The package: Fastest strstr-like function in C!?[^] I will be glad to see how both Intel & AMD i.e. Haswell & Excavator perform.