fst. There is a very efficient crate in Rust that can be used to very quickly generate a list of possible corrections within N edit distance of a"/>
Levenshtein Distance, Edit Distance, or Manhattan Distance are metrics to measure how many corrections are needed to get from one word to another. Corrections are measured one for each deleted characters, one for each added character, and one for each substituted character. There is a very efficient crate in Rust called fst that can be used to very quickly generate a list of possible corrections within N edit distance of a given string.
What happens if you search for "funetic", instead of "phonetic". These two terms are phonetically nearly identical but the spelling is quite different. For these cases, we will use a Rust crate called eudex. So for our autocorrect we have two steps. One, find a list of possible misspellings. Two, rank the misspellings in order of phonetic similarity.
First let's build a convenience function for building a phonetic index from a dictionary.
Now to search the phonetic index we create a Levenshtein search over the fst data. After we find a list of similar words we then rank them by their phonetic similarity. The results are fairly good, particularly for short words.