I have seen a few years back a nice and simple algorithm that, given a (finite) set of words in some alphabet, builds a context-free grammar for a language including these words and in some sense "natural" (e.g., the grammar doesn't produce all words in the alphabet). The algorithm is very simple, it has something like 3--4 rules for grammar transformation attempted on each new word. Any help in finding it would be appreciated.
Asked By : nikita
Answered By : Pseudonym
I think you might be referring to Sequitur.
Edit It has been suggested by other commenters that I leave more information for posterity. Fair point.
Sequitur is an algorithm by Craig Neville-Manning and Ian Witten (of Managing Gigabytes fame). It's linear time in the size of the input sequences (although so is the memory usage), and satisfies the twin properties of parsimony (no redundant rules are derived) and utility (every rule is useful).
However, it can't (IIRC) discover arbitrary nesting structure. So a prototypical expression grammar, where an expression can contain an expression, is too much for it. But it will discover word boundaries in English text, and repeat regions in DNA. It's also useful for finding dictionaries for data compression (which is one of Witten's major research interests).
Best Answer from StackOverflow
Question Source : http://cs.stackexchange.com/questions/9246
0 comments:
Post a Comment
Let us know your responses and feedback