According to Wikipedia, generative programming is "to manufacture software components in an automated way". I find this definition sufficient, but to illustrate the point more I would like to give the example of Natural Language Processing. It is very simple to spit out vocabulary words according to simple classifications like nouns/verbs/adjectives. However, writing meaningful text with proper grammar is not so simple. One example of generative programming would be to write a similar text in multiple languages with proper grammar. Say it once, Hear it everywhere.
In this article I aim to show some simple implementation of generative programming to generate some text in English with appropriate grammar considerations. This may seem impossible or it may seem simplistic depending on your previous experience with Natural Language Processing. Either way, my goal is not to actually generate good text, but rather to illustrate the generative capabilities of Rust. For this example we will generate sentences in the form of [subject] [verb] [direct object] [adverb].
It is important to note that words and ngrams may be ambiguous, sometimes very ambiguous. To capture this information in our vocabulary we make a bitfield of all the properties that a word can have and then we will store them in a simple tree data structure. There are more compact representations, like finite state machines or neural networks, that we may use later. For now we will just use a Character Trie to hold WordUsage bitfields at each word boundary.
For the language reading and writing we will use a simple graph representation to represent the language as a whole. EBNF is good for context-free languages with small lookaheads, however this can quickly become unwieldy when there is a larger amount of ambiguity present in the language. Surprisingly, from past experience, I have found that CFG is sometimes usable to represent natural language in many cases. This is the grammar nazi approach to language. When Context Free Grammars break, they usually break down by accepting everything and rejecting nothing. It is always possible to draw in broad strokes and lose in specificity. This is the short reasoning behind this approach and along with it comes the whole wagon of problems with statistical language processing.
This is all well and good, but where is the generative programming? Well, so far we have hard coded the representation of word usage and some very quaint grammar rules. It is infeasible for us to manually hardcode every word usage, grammar rule, and vocabulary into this machine. It is time for some Machine Learning! To learn new vocabulary and grammar we will implement a feedforward neural network and we will have the entire code mostly generated for us from what we already have. Don't believe me, just read the code:
Neural Networks are a mathematical abstraction of something much simpler: good and bad stimulus, i.e. pain and pleasure. For our use case it is straightforward to say that it is good when we recognize a word and it is also good when we have an appropriate grammar rule defined. It is bad when we have no appropriate rule for a word. It is bad when we have too much ambiguity. For Neural Networks this is all we need defined to then learn from a dataset.
This adventure continues in a new post: Continuing Natural Language Processing in Rust