fast search

duta

Hi there Let's say I have a txt file with 100.000 words which I'll load into memory. I need to manage this file as a database in order to provide a character prediction application. What method can I use in order to have a fast response, even on embedded devices?

Christian Graus

Speed of search comes through complexity of code. The more indexes you build, the faster it will be, but the more memory it will use.

Christian Graus Driven to the arms of OSX by Vista. "Iam doing the browsing center project in vb.net using c# coding" - this is why I don't answer questions much anymore. Oh, and Microsoft doesn't want me to.

Ennis Ray Lynch Jr

Don't load it into memory. Use a file stream and appropriate indexes and it will be fast as the size of the file increases close to and beyond the amount of available ram for the application, especially on embedded devices.

Need software developed? Offering C# development all over the United States, ERL GLOBAL, Inc is the only call you will have to make.
Happiness in intelligent people is the rarest thing I know. -- Ernest Hemingway
Most of this sig is for Google, not ego.

Wendelius

One possibility is to use even SQL Server compact edition and with a constantly open connection query potential words from db. This would ease the index building.

The need to optimize rises from a bad design. My articles[^]

riced

Here's a suggestion. Assuming the file is sorted so the words are in alphabetic order you can treat it as an array of words and use the Seek method to do a binary search. There are a few caveats e.g. I think you need to use a BufferedStream and it might mean padding words with trailing spaces so you can calculate the offset. Just a thought - perhaps not completely practical.

Pete OHanlon

One way to do this would be to split the words up into smaller chunks, and then have *pointers* to keep them together. Consider this small file: Adrian Andrea Andrew Anthony Brian Charles William Winston This could be tokenised like this: Ad ri an An dr ea ew th on y Br ia n Ch ar le s Wi ll ia m ns to n As you can see, the list of choices narrows quite dramatically, the further on you get, and the information becomes quite easy to traverse. In this example, the user types in A and gets a choice of 4 entries. As soon as they press n, it breaks down to 3. Pressing d narrows it down to 2, and they keep going until they get to the end (or choose one out of your selection). The downside to this approach, is the actual splitting of the words is the time consuming part of the process, but if your solution allows you to preparse them into smaller units up front, the results can be quite dramatic.

Deja View - the feeling that you've seen this post before.

My blog | My articles | MoXAML PowerToys

Mark Churchill

Read it and slap it in a tree structure so you can traverse quickly thru the possibilities.

Mark Churchill Director, Dunn & Churchill Pty Ltd Free Download: Diamond Binding: The simple, powerful, reliable, and effective data layer toolkit for Visual Studio.
Alpha release: Entanglar: Transparant multiplayer framework for .Net games.

N a v a n e e t h

Try SqlLite[^]- a file system based SQL database engine. Then use normal SQL queries to fetch the required data. It would be much faster.

Navaneeth How to use google | Ask smart questions

jas0n23

that's a great answer dude!

Alan Balkany

A database has a lot of overhead, which you can avoid with your own data structure. I suggest a tree structure where each level takes you one letter farther in the word: The root will have 26 sons, for the 26 possible first letters. Each of these sons will have up to 26 sons (grandsons of the root) for the (up to) 26 possible second letters, and so on. 1. It saves space because all words sharing a common prefix will use the same path from the root, giving you some compression. 2. It's faster than a database because you don't have to do any time-consuming queries; at each node you have a list of all the possible next characters. 3. When building this tree from your word list, you can increment a counter for each letter added at the current node. This will give you the frequencies for each continuation letter. You can then use these frequencies to predict the most likely continuation.