Absimpa v684

absimpa
Class GrammarBuilder<N,C extends java.lang.Enum<C>>

java.lang.Object
  extended by absimpa.GrammarBuilder<N,C>
Type Parameters:
N - is the type of the objects created by the parser to be generated from the grammar constructed with a GrammarBuilder
C - is the type of token codes provided by the lexer that will be used by parser

public class GrammarBuilder<N,C extends java.lang.Enum<C>>
extends java.lang.Object

helps to build a Grammar for a language made from objects of type C. To create your grammar, the following bits and pieces are needed.

  1. A Lexer is needed that provides token codes of type C. It is completely up to you what these token codes stand for. For the parser and the grammar the only interesting bit is that C is an Enum. Typically a token code signifies a type of token like NUMBER, OPENBRACKET, IDENTIFIER and so on. Normally the lexer will internally associate the current token code also with a piece of text, the token text, but the parser we built is never interested in this text. Nevertheless, the text is not lost. Read on.
  2. A result type N is needed. This package makes no assumption about the type of result created, it is just called N. Whenever the parser has recognized a token, it will ask the lexer to provide a leaf node for it. And when it has recognized a partial bit of input, it will call a NodeFactory with the bits it has recognized and asks it to create an N.
  3. One or more NodeFactory objects are needed. Whenever the parser recognizes a part of the input, for example a sequence of tokens as described by the seq() method, it will call the associated NodeFactory with the bits it recognized to obtain a new N.

It is up to you to define what N is. It may be a type that describes a syntax tree, but it may as well describe something completely different that is incrementally built up while parsing.

To build a grammar, start by defining a few token recognizing grammars with token(). Then you can combine these, for example into a sequence, by passing the first to seq() and then adding more with Sequence.add(absimpa.Grammar). To repeat things, use star() or repeat(). Use choice() and subsequently Choice.or() to create a choice of subnodes. A rough outline of an example is:

 GrammarBuilder<...> gb = 
   new GrammarBuilder<...>(nodeFactory);
 Grammar<...> NUMBER = gb.token(CODE_NUMBER);
 Grammar<...> product = gb.star(NUMBER);
 Parser<...> parser = product.compile();
 

This would define a product as arbitrary length sequence of NUMBERs. When the parser has collected all NUMBERs available, it will call the nodeFactory with the list of results provided by the lexer to the token parser. The result of the parse would be whatever the nodeFactory makes out of the list. It could return an object that contains the list as a field, but it could as well multiply the numbers and return an N that contains just the product.

Slightly tricky is the creation of a recursive grammar. Create a Recurse as a placeholder and later set its recursive child with Recurse.setChild(Grammar).

The use of the GrammarBuilder is recommended over direct use of the grammar classes like Sequence, Choice and so on, because it reliefs from a lot of generic parameter typing.

Parsing EOF

To make sure that parsers compiled from the grammars produced here parse the whole input sequence, the lexer eventually needs to produce a specific eof-token which is explicitly matched by the grammar. Suppose your grammar is nothing but G-> (SCOPE TERM)+. The parser will happily parse a non-empty sequence of SCOPE and TERM pairs. In particular it will succeed for the sequence SCOPE TERM TERM with parsing the first pair and will leave the 2nd TERM unparsed. To prevent this, the grammar should rather be G-> (SCOPE TERM)+ EOF.

Multiple Argument Methods

It would be nice to use varargs methods for seq() and choice(). But due to the generic parameters of grammars, there would then always be the compiler warning about unsafe conversion to array. Therefore there are variants of those methods with up to 5 parameters. Only beyond that, varargs is used.

Remark: This package has no support to create a Lexer, but there is a SimpleLexer the source code of which may serve to demonstrate the principles.


Constructor Summary
GrammarBuilder(NodeFactory<N> defaultFactory)
           the resulting GrammarBuilder will enter the given factory into grammar objects as they are created, if no factory is explicitly provided.
 
Method Summary
 Choice<N,C> choice(Grammar<N,C> g)
          creates a grammar to recognize any one of a number of sub grammars, one of which is the given g.
 Choice<N,C> choice(Grammar<N,C> g1, Grammar<N,C> g2)
           
 Choice<N,C> choice(Grammar<N,C> g1, Grammar<N,C> g2, Grammar<N,C> g3)
           
 Choice<N,C> choice(Grammar<N,C> g1, Grammar<N,C> g2, Grammar<N,C> g3, Grammar<N,C> g4)
           
 Choice<N,C> choice(Grammar<N,C> g1, Grammar<N,C> g2, Grammar<N,C> g3, Grammar<N,C> g4, Grammar<N,C> g5)
           
 Choice<N,C> choice(Grammar<N,C> g1, Grammar<N,C> g2, Grammar<N,C> g3, Grammar<N,C> g4, Grammar<N,C> g5, Grammar<N,C>... more)
           
 Repeat<N,C> opt(Grammar<N,C> grammar)
          convenience function to call repeat(Grammar,int,int) with min=0 and max=1.
 Repeat<N,C> opt(NodeFactory<N> nf, Grammar<N,C> grammar)
          convenience function to call repeat(NodeFactory,Grammar,int,int) with min=0 and max=1.
 Repeat<N,C> repeat(Grammar<N,C> grammar, int min, int max)
           creates a grammar like repeat(NodeFactory, Grammar,int,int), but with the default NodeFactory.
 Repeat<N,C> repeat(NodeFactory<N> factory, Grammar<N,C> g, int min, int max)
          creates a grammar to recognize a repetition of the given subgrammar g.
 Sequence<N,C> seq(Grammar<N,C> g)
           creates a grammar like seq(NodeFactory, Grammar), but with the default NodeFactory.
 Sequence<N,C> seq(Grammar<N,C> g1, Grammar<N,C> g2)
           
 Sequence<N,C> seq(Grammar<N,C> g1, Grammar<N,C> g2, Grammar<N,C> g3)
           
 Sequence<N,C> seq(Grammar<N,C> g1, Grammar<N,C> g2, Grammar<N,C> g3, Grammar<N,C> g4)
           
 Sequence<N,C> seq(Grammar<N,C> g1, Grammar<N,C> g2, Grammar<N,C> g3, Grammar<N,C> g4, Grammar<N,C> g5)
           
 Sequence<N,C> seq(Grammar<N,C> g1, Grammar<N,C> g2, Grammar<N,C> g3, Grammar<N,C> g4, Grammar<N,C> g5, Grammar<N,C>... more)
           
 Sequence<N,C> seq(NodeFactory<N> factory, Grammar<N,C> g)
           creates a grammar to recognize a sequence of subgrammars which starts with g.
 Repeat<N,C> star(Grammar<N,C> grammar)
          convenience function to call repeat(Grammar,int,int) with min=0 and max=Integer.MAX_VALUE.
 Repeat<N,C> star(NodeFactory<N> nf, Grammar<N,C> grammar)
          convenience function to call repeat(NodeFactory,Grammar,int,int) with min=0 and max=Integer.MAX_VALUE.
 TokenGrammar<N,C> token(C code)
           creates a grammar to recognize the given token code.
 
Methods inherited from class java.lang.Object
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

GrammarBuilder

public GrammarBuilder(NodeFactory<N> defaultFactory)

the resulting GrammarBuilder will enter the given factory into grammar objects as they are created, if no factory is explicitly provided.

Method Detail

token

public TokenGrammar<N,C> token(C code)

creates a grammar to recognize the given token code.


seq

public Sequence<N,C> seq(NodeFactory<N> factory,
                         Grammar<N,C> g)

creates a grammar to recognize a sequence of subgrammars which starts with g. The factory provided will be used to transform the recognized list of elements into the result type N. To add more subgrammars to the sequence, use Sequence.add(absimpa.Grammar).


seq

public Sequence<N,C> seq(Grammar<N,C> g)

creates a grammar like seq(NodeFactory, Grammar), but with the default NodeFactory.


seq

public Sequence<N,C> seq(Grammar<N,C> g1,
                         Grammar<N,C> g2)

seq

public Sequence<N,C> seq(Grammar<N,C> g1,
                         Grammar<N,C> g2,
                         Grammar<N,C> g3)

seq

public Sequence<N,C> seq(Grammar<N,C> g1,
                         Grammar<N,C> g2,
                         Grammar<N,C> g3,
                         Grammar<N,C> g4)

seq

public Sequence<N,C> seq(Grammar<N,C> g1,
                         Grammar<N,C> g2,
                         Grammar<N,C> g3,
                         Grammar<N,C> g4,
                         Grammar<N,C> g5)

seq

public Sequence<N,C> seq(Grammar<N,C> g1,
                         Grammar<N,C> g2,
                         Grammar<N,C> g3,
                         Grammar<N,C> g4,
                         Grammar<N,C> g5,
                         Grammar<N,C>... more)

repeat

public Repeat<N,C> repeat(NodeFactory<N> factory,
                          Grammar<N,C> g,
                          int min,
                          int max)
creates a grammar to recognize a repetition of the given subgrammar g. The provided factory will be used to transform the recognized list of elements into the result type N.

Parameters:
min - is the minimum number of times the subgrammar must be found in the input
max - is the maximum number of times the subgrammar must be found in the input
See Also:
star(absimpa.Grammar), opt(absimpa.Grammar)

repeat

public Repeat<N,C> repeat(Grammar<N,C> grammar,
                          int min,
                          int max)

creates a grammar like repeat(NodeFactory, Grammar,int,int), but with the default NodeFactory.


star

public Repeat<N,C> star(Grammar<N,C> grammar)
convenience function to call repeat(Grammar,int,int) with min=0 and max=Integer.MAX_VALUE.


star

public Repeat<N,C> star(NodeFactory<N> nf,
                        Grammar<N,C> grammar)
convenience function to call repeat(NodeFactory,Grammar,int,int) with min=0 and max=Integer.MAX_VALUE.


opt

public Repeat<N,C> opt(Grammar<N,C> grammar)
convenience function to call repeat(Grammar,int,int) with min=0 and max=1.


opt

public Repeat<N,C> opt(NodeFactory<N> nf,
                       Grammar<N,C> grammar)
convenience function to call repeat(NodeFactory,Grammar,int,int) with min=0 and max=1.


choice

public Choice<N,C> choice(Grammar<N,C> g)
creates a grammar to recognize any one of a number of sub grammars, one of which is the given g. To add more sub grammars to the choice, call Choice.or(absimpa.Grammar). In contrast to all other grammars, the Choice does not need a NodeFactory, because the resulting parser just passes on the choice that was recognized.


choice

public Choice<N,C> choice(Grammar<N,C> g1,
                          Grammar<N,C> g2)

choice

public Choice<N,C> choice(Grammar<N,C> g1,
                          Grammar<N,C> g2,
                          Grammar<N,C> g3)

choice

public Choice<N,C> choice(Grammar<N,C> g1,
                          Grammar<N,C> g2,
                          Grammar<N,C> g3,
                          Grammar<N,C> g4)

choice

public Choice<N,C> choice(Grammar<N,C> g1,
                          Grammar<N,C> g2,
                          Grammar<N,C> g3,
                          Grammar<N,C> g4,
                          Grammar<N,C> g5)

choice

public Choice<N,C> choice(Grammar<N,C> g1,
                          Grammar<N,C> g2,
                          Grammar<N,C> g3,
                          Grammar<N,C> g4,
                          Grammar<N,C> g5,
                          Grammar<N,C>... more)

Absimpa v684