Absimpa v684

absimpa.lexer
Class SimpleLexer<N,C extends java.lang.Enum<C>>

java.lang.Object
  extended by absimpa.lexer.SimpleLexer<N,C>
Type Parameters:
C - is an enumeration and describes the token codes provided to the parser. In addition, the enum know how to transform a token code into an N
N - is the date type returned for a token when the parser has recognized it and calles next()
All Implemented Interfaces:
Lexer<N,C>
Direct Known Subclasses:
ExprLanguage.L

public class SimpleLexer<N,C extends java.lang.Enum<C>>
extends java.lang.Object
implements Lexer<N,C>

is an example implementation of a Lexer which analyzes a string by trying out regular expressions for tokens until a match is found. This is not intended for productive use. It is merely an example.

This lexer is set up by specifying a list of pairs (regex, C), where C is some enumeration type, the generic parameter of this class. To analyze an input string, the lexer tries to match each of the regular expressions at the beginning of the input string. If it finds a match, the associated C represents the current token code provided to the parser. If next() is called, the matching prefix of the input is converted to an N by means of the LeafFactory implemented by the C type. The result is returned, while the lexer starts over with the next token.

If no match can be found, the behaviour depends on whether setSkipRe() was called. If yes, the regular expression is tried, and if it matches, the corresponding text is ignored and the lexer starts over trying to match the regular expressions. If the skip regular expression does not match, a ParserException is thrown. If no regular expression to skip was set, or if it was set to null, the lexer behaves as if every non-matching character may be skipped. Consequently, input that cannot be matched is then silently discarded.


Constructor Summary
SimpleLexer(java.lang.Class<CC> tokenCode, LeafFactory<N,C> leafFactory)
           adds all constants found in class tokenCode with addToken(C, java.lang.String) except if it is identical to the LexerInfo.eofCode() it provides.
SimpleLexer(C eofCode, LeafFactory<N,C> leafFactory)
           creates a TrivialLexer to return eofCode when the end of input is encountered.
 
Method Summary
 SimpleLexer<N,C> addToken(C tc, java.lang.String regex)
           adds a mapping from a regular expression to the given token code.
 C current()
           provides the current token code.
 java.lang.String currentText()
           
 Token<N,C> currentToken()
          returns the current token.
 void initAnalysis(java.lang.CharSequence text)
           resets the lexer and initializes it to analyze the given text.
 N next()
           discards the current token and advance to the next one.
 ParseException parseException(java.util.Set<C> expectedTokens)
           creates a ParseException on request from the parser.
 void setSkipRe(java.lang.String regex)
           
 java.lang.String toString()
           
 
Methods inherited from class java.lang.Object
equals, getClass, hashCode, notify, notifyAll, wait, wait, wait
 

Constructor Detail

SimpleLexer

public SimpleLexer(C eofCode,
                   LeafFactory<N,C> leafFactory)

creates a TrivialLexer to return eofCode when the end of input is encountered.


SimpleLexer

public SimpleLexer(java.lang.Class<CC> tokenCode,
                   LeafFactory<N,C> leafFactory)

adds all constants found in class tokenCode with addToken(C, java.lang.String) except if it is identical to the LexerInfo.eofCode() it provides. It is assumed, that toString() of a code returns a regular expression that defines the strings representing the token.

IMPORTANT:Make sure to define the code constants of <C> in the order you want the regular expressions tried out by the lexer.

Method Detail

initAnalysis

public void initAnalysis(java.lang.CharSequence text)
                  throws ParseException

resets the lexer and initializes it to analyze the given text. To prepare the first token, next() is called internally.

Throws:
ParseException

setSkipRe

public void setSkipRe(java.lang.String regex)

parseException

public ParseException parseException(java.util.Set<C> expectedTokens)
Description copied from interface: Lexer

creates a ParseException on request from the parser. This method is called by the parser if it finds a token code that does not fit its grammar. It is up to the Lexer implementation to provide as much information as possible in the exception about the current position of the input.

Specified by:
parseException in interface Lexer<N,C extends java.lang.Enum<C>>
Parameters:
expectedTokens - a set of tokens that the parser would have expected at the current position.

addToken

public SimpleLexer<N,C> addToken(C tc,
                                 java.lang.String regex)

adds a mapping from a regular expression to the given token code. No provisions are taken to detect conflicting regular expressions, i.e. regular expressions with common matches. To define a specific keyword, e.g. package and also a general identifier, e.g. [a-z]+, make sure to call addToken first for the more specific token. Otherwise it will never be matched.


current

public C current()
Description copied from interface: Lexer

provides the current token code. This method must always return the same token code as long as Lexer.next() is not called.

Specified by:
current in interface Lexer<N,C extends java.lang.Enum<C>>

next

public N next()
       throws ParseException

discards the current token and advance to the next one. This may involve skipping over input that cannot be matched by any regular expression added with addToken(C, java.lang.String).

Specified by:
next in interface Lexer<N,C extends java.lang.Enum<C>>
Returns:
a token code or, on end of input, the specific token code provided to the constructor
Throws:
ParseException
ParseException

currentToken

public Token<N,C> currentToken()

returns the current token.


currentText

public java.lang.String currentText()

toString

public java.lang.String toString()
Overrides:
toString in class java.lang.Object

Absimpa v684