Generating Lots of Misspelled Words for NLP Testing

Quantifying NLP Algorithm Performance

Mean Squared Error (MSE) is a common method to quantify how well an estimator performs against labelled test data. When it comes to certain NLP tasks, there is an inherent randomness in the best test data due to human error. Humans make typos or spell things wrong, yet we still communicate and understand each other. To automate the creation of lots of this sort of fuzzy test data I created the misspeller tool.

The Need for Fuzzy Test Data

For simple tasks, like spelling correction or autofill suggestion, the estimator is expected to perform well despite human error. By generating a set of common spelling mistakes for each word, it is possible to then run our estimator against this fuzzy yet strongly labelled test data. We can then measure MSE against our original word to determine how well our algorithm is performing. A spelling correction algorithm should suggest the original word with high precedence.