2
* DO NOT ALTER OR REMOVE COPYRIGHT NOTICES OR THIS HEADER.
4
* Copyright 1997-2007 Sun Microsystems, Inc. All rights reserved.
6
* The contents of this file are subject to the terms of either the GNU
7
* General Public License Version 2 only ("GPL") or the Common
8
* Development and Distribution License("CDDL") (collectively, the
9
* "License"). You may not use this file except in compliance with the
10
* License. You can obtain a copy of the License at
11
* http://www.netbeans.org/cddl-gplv2.html
12
* or nbbuild/licenses/CDDL-GPL-2-CP. See the License for the
13
* specific language governing permissions and limitations under the
14
* License. When distributing the software, include this License Header
15
* Notice in each file and include the License file at
16
* nbbuild/licenses/CDDL-GPL-2-CP. Sun designates this
17
* particular file as subject to the "Classpath" exception as provided
18
* by Sun in the GPL Version 2 section of the License file that
19
* accompanied this code. If applicable, add the following below the
20
* License Header, with the fields enclosed by brackets [] replaced by
21
* your own identifying information:
22
* "Portions Copyrighted [year] [name of copyright owner]"
26
* The Original Software is NetBeans. The Initial Developer of the Original
27
* Software is Sun Microsystems, Inc. Portions Copyright 1997-2007 Sun
28
* Microsystems, Inc. All Rights Reserved.
30
* If you wish your version of this file to be governed by only the CDDL
31
* or only the GPL Version 2, indicate your decision by adding
32
* "[Contributor] elects to include this software in this distribution
33
* under the [CDDL or GPL Version 2] license." If you do not indicate a
34
* single choice of license, a recipient has the option to distribute
35
* your version of this file under either the CDDL, the GPL Version 2 or
36
* to extend the choice of license to its licensees as provided above.
37
* However, if you add GPL Version 2 code and therefore, elected the GPL
38
* Version 2 license, then the option applies only if the new code is
39
* made subject to such option by the copyright holder.
42
package org.netbeans.spi.lexer;
44
import org.netbeans.api.lexer.TokenId;
45
import org.netbeans.api.lexer.Token;
48
* Lexer reads input characters from {@link LexerInput} and groups
51
* The lexer delegates token creation
52
* to {@link TokenFactory#createToken(TokenId)}.
53
* Token factory instance should be given to the lexer in its constructor.
56
* The lexer must be able to express its internal lexing
57
* state at token boundaries and it must be able
58
* to restart lexing from such state.
60
* It is expected that if the input characters following the restart point
61
* would not change then the lexer will return the same tokens
62
* regardless whether it was restarted at the restart point
63
* or run from the input begining as a batch lexer.
67
* <b>Testing of the lexers</b>:
69
* Testing of newly written lexers can be performed in several ways.
70
* The most simple way is to test batch lexing first
72
* <a href="http://www.netbeans.org/source/browse/lexer/test/unit/src/org/netbeans/lib/lexer/test/simple/Attic/SimpleLexerBatchTest.java">
73
* org.netbeans.lib.lexer.test.simple.SimpleLexerBatchTest</a> in lexer module tests).
75
* Then an "incremental" behavior of the new lexer can be tested
76
* (see e.g. <a href="http://www.netbeans.org/source/browse/lexer/test/unit/src/org/netbeans/lib/lexer/test/simple/Attic/SimpleLexerIncTest.java">
77
* org.netbeans.lib.lexer.test.simple.SimpleLexerIncTest</a>).
79
* Finally the lexer can be tested by random tests that randomly insert and remove
80
* characters from the document
81
* (see e.g. <a href="http://www.netbeans.org/source/browse/lexer/test/unit/src/org/netbeans/lib/lexer/test/simple/Attic/SimpleLexerRandomTest.java">
82
* org.netbeans.lib.lexer.test.simple.SimpleLexerRandomTest</a>).
84
* Once these tests pass the lexer can be considered stable.
87
* @author Miloslav Metelka
91
public interface Lexer<T extends TokenId> {
94
* Return a token based on characters of the input
95
* and possibly additional input properties.
97
* Characters can be read by using
98
* {@link LexerInput#read()} method. Once the lexer
99
* knows that it has read enough characters to recognize
101
* {@link TokenFactory#createToken(TokenId)}
102
* to obtain an instance of a {@link Token} and then returns it.
105
* <b>Note:</B> Lexer must *not* return any other <code>Token</code> instances than
106
* those obtained from the TokenFactory.
110
* The lexer is required to tokenize all the characters (except EOF)
111
* provided by the {@link LexerInput} prior to returning null
112
* from this method. Not doing so is treated
113
* as malfunctioning of the lexer.
116
* @return token recognized by the lexer
117
* or null if there are no more characters (available in the input) to be tokenized.
119
* Return {@link TokenFactory#SKIP_TOKEN}
120
* if the token should be skipped because of a token filter.
122
* @throws IllegalStateException if the token instance created by the lexer
123
* was not created by the methods of TokenFactory (there is a common superclass
124
* for those token implementations).
125
* @throws IllegalStateException if this method returns null but not all
126
* the characters of the lexer input were tokenized.
128
Token<T> nextToken();
131
* This method is called by lexer's infrastructure
132
* to return present lexer's state
133
* once the lexer has recognized and returned a token.
135
* In mutable environment this method is called after each recognized token
136
* and its result is paired (together with token's lookahead) with the token
137
* for later use - when lexer needs to be restarted at the token boundary.
140
* If the lexer is in no extra state (it is in a default state)
141
* it should return <code>null</code>. Most lexers are in the default state
142
* only at all the time.
144
* If possible the non-default lexer states should be expressed
145
* as small non-negative integers.
147
* There is an optimization that shrinks the storage costs for small
148
* <code>java.lang.Integer</code>s to single bytes.
152
* The returned value should not be tied to this particular lexer instance in any way.
153
* Another lexer instance may be restarted from this state later.
156
* @return valid state object or null if the lexer is in a default state.
161
* Infrastructure calls this method when it no longer needs this lexer for lexing
162
* so it becomes unused.
164
* If lexer instances are cached and reused later
165
* then this method should first release all the references that might cause
166
* memory leaks and then add this unused lexer to the cache.