efficiency comparision between memory access and database access
-
Dear All, I am working on a project towards a text retrieval system. In the system, a document is represented by features (i.e., some words and phrases conveying the essence of the document). And the similarity between a query and a certain document is dertermined by counting the matching of features. Currently, at retrieval time, features of all documents are held in memory. However, holding all feature data in memory will become impossible as the number of documents dramatically increases. I have thought of this problem for a long time. The only way I can figure out at this time is to use database technology. However, the problem that follows is that the drop-off in efficiency (retrieval response time will increase as accessing hard disk, where database is located, is slower than accessing memory ) by using database technology. I do not have any experience and idea about this. Or there may be other better technologies available? Sometimes, I wonder how Google to solve this problem. Please help. Thanks!
-
Dear All, I am working on a project towards a text retrieval system. In the system, a document is represented by features (i.e., some words and phrases conveying the essence of the document). And the similarity between a query and a certain document is dertermined by counting the matching of features. Currently, at retrieval time, features of all documents are held in memory. However, holding all feature data in memory will become impossible as the number of documents dramatically increases. I have thought of this problem for a long time. The only way I can figure out at this time is to use database technology. However, the problem that follows is that the drop-off in efficiency (retrieval response time will increase as accessing hard disk, where database is located, is slower than accessing memory ) by using database technology. I do not have any experience and idea about this. Or there may be other better technologies available? Sometimes, I wonder how Google to solve this problem. Please help. Thanks!
Theres quite a few ways that these problems are solved - you might want to read up on different caching strategies and indexing techniques. Generally this is taken care of in a large part by the database software. Some databases servers are quite specialized at rapid retrieval - such as Berkeley DB[^]. As for similarity and feature recognition, theres loads of research into that - especially for audio fingerprinting. If you don't want to do anything too funky then just trust that your database server was written by someone that knew what they were doing ;)
Mark Churchill Director Dunn & Churchill Free Download:
Diamond Binding: The simple, powerful, reliable, and effective data layer toolkit for Visual Studio.