The transfer of Technology

pww71

技术转让2000,000,000 US dollar。 The core of the core of the big data solutions -- Map In the age of big data how to improve performance to ten times 我的算法是完美哈希算法，键的索引以及压缩算法的原理是独树一帜与众不同的，关键是结构完全不同，所以键索引压缩就根本性不同。大家可以参考以下文章: My algorithm is perfect hash algorithm, key index and the principle of compression algorithm is out of the ordinary, the key is a completely different structure, so the key index compression is a fundamentally different. You can refer to the following article: http://blog.csdn.net/chixinmuzi/article/details/1727195 对于c++程序来说 map的使用无处不在。影响程序性能的瓶颈也往往是map的性能。尤其在大数据情况下，以及业务关联紧密而无法实现数据分发和并行处理的情况。map的性能就成了最关键的技术。大数据的书中说道： “但是Hadoop就不拘泥于这样的方式。相反，它假定了数据量的巨大使得数据完全无法移动，所以人们必须在本地进行数据分析。” 在电信行业和信息安全行业的工作经历，我都是和底层大数据打交道，尤其信息安全行业数据最复杂，都离不开map。比如：ip表、mac表，电话号码表、域名解析表、身份证号码表的查询、病毒木马的特征码的云查杀等等。 stl库的map采用二分查找，性能最差。Google的哈希map性能和内存目前是最优的，但是有重复碰撞的机率。现在大数据基本上很少用有碰撞几率的map 因为涉及到收费问题一点不能错. 现在我把自己的算法发布出来。里面有三种map，build之后是哈希map。大家可以测试对比发现，我的算法属于零碰撞的几率，但是性能比哈希算法还优。就是普通map的性能也和google相差无几。程序使用我的map 最直接的效益就是原来需要十个服务器解决的方案现在只需要一个服务器声明：该代码不能用于商业用途，如需商业应用可以联系我QQ。 The c++ program is the use of map everywhere. Effect of bottleneck program performance is often the performance of map. Especially in the data, and business are closely linked and cannot realize the data distribution and parallel processing. The performance of map becomes the key technology. Said the big data book: "But Hadoop is not rigidly adhere to this way. On the contrary, it is assumed that the huge amount of data makes data completely unable to move, so people must carry on the data analysis in the local." Experience in telecom industry and information security industry, I was dealing with large data base, especially the information security industry data is the most complex, it do nothing without map. For example: IP table, MAC table, telephone numbers, identity card number table query, the Trojan horse virus characteristic code cloud killing etc.. STL map using binary chop, the worst performance. Google Hash map performance and memory is the best, but repeated collision probability. Now I put my algorithm released. There are three kinds of map, build is Hash map. you can find that our algorithm belongs to the zero probability of collision, but also better performance than the hash algorithm. and the performance of ordinary map and Google not much difference between. Procedures for the use of my map is the most direct benefit is the original ten server solution now need only one server Disclaimer: the code can not be