In my Android app I want to have an input field with autocomplete. The number of items will be about 300000. The best solution seems to be to put the items into a file (on sdcard), one item per line, each line would have the same number of characters so that I can seek to specific line number. If the user enters something in the text field, I would binary search (via RandomAccessFile) the file and show suggestions.
I want the autocomplete to be super fast (ideally under 100ms but I guess it's impossible), what optimizations I can do?
I will convert the users input to lowercase english characters (a-z) with spaces. So 'A/b' would be converted to 'a b' and then searched.
I now realized I need additional thing - to search for word-starting substrings.
What your looking for is called a TRIE
In computer science, a trie, or prefix tree, is an ordered tree data structure that is used to store an associative array where the keys are usually strings. Unlike a binary search tree, no node in the tree stores the key associated with that node; instead, its position in the tree shows what key it is associated with. All the descendants of a node have a common prefix of the string associated with that node, and the root is associated with the empty string. Values are normally not associated with every node, only with leaves and some inner nodes that correspond to keys of interest.
Why don't you just use a SQLite DB rather than a text file?
I don't think you can do anything better speed-wise than a portable database in your situation.
Trie is the obvious answer, and already mentioned, but additionally tr13 library might be what you are looking at. It is garbage collector friendly (single raw byte array or byte buffer), compact, and definitely fast enough for your case. Keys are typically UTF-8 strings although can be any byte sequences. Values likewise, although there is also alternative for variable-length ints (vints) used to get very compact String-to-int lookups (esp. for smallish set of ints).
One strategy could be to narrow the results using the
RandomAccessFile and Binary Search. Then, once the possible entries is small enough, load that portion into memory, and do an in memory search.
This will improve performance, because as people type you can quickly search the same portion of the file that you have loaded in memory.
check this out http://en.wikipedia.org/wiki/Binary_search_algorithm
in a sorted file you have a binary search worst case of O(log(n)) the next best thing would be some sort of hashmapping whic goes O(1) altough this is complicated for partial words and will produce a huge mapping table.
Preprocess your possibilities into a search tree ahead of time, instead of doing it at runtime.
A major problem with one-word-per-line storage is that there is no random access for lines in constant time (accessing line X consists in counting X newline characters from the beginning of the file) so your binary search would suffer.
What you need in this specific (auto-complete) situation is a Prefix Tree or a variation of it (combining several nodes into one, or turning sub-trees smaller than a certain size into plain old sorted list of words).
100ms is plenty of time. The biggest worry would be the display updates, I'd think.
If you're wanting to avoid an actual database, this is easy enough to do with a simple index file in addition to your main file.
You could store the first N bytes (4 maybe?) of the string and a file offset into the main file in a index every 32 records or so, and binary search across that. You could then linearly search through up to 32 records after a binary search got you pretty close.
You can tune the index frequency from 32 records to whatever makes sense given your average string length and the size of a single read on your media. If you had 512 byte filesystem reads, and 8 byte average strings, then you'd do an index every 64 records, etc. There's not much point in having more than one index record per minimum disk read size.
The index file could be generated easily, and you could then manage the main file with a simple text editor.
I would suggest to see if you can use a standard library for this purpose. Maybe apache lucene can be used in android phones. If so, you can build an index (word prefix -> an id of a word in the android sql lite). Here is a discussion about a kind of algorithm lucene is using.
Old thread, but THIS IS WHAT YOU NEED: Stringsearch library
I used it for my app 'Wordlist Pro' for Android and it is really speedy.
I could also do something like this (below is a preprocessed file):
aa - line 1 ab - line 17 . . zz - line 299819
If user inputs something starting with aa, I would read lines 1 - 17 and sequentially search in them