1
<?xml version="1.0" encoding="latin1" ?>
2
<!DOCTYPE erlref SYSTEM "erlref.dtd">
7
<year>2009</year><year>2009</year>
8
<holder>Ericsson AB. All Rights Reserved.</holder>
11
Copyright (c) 2008 Robert Virding. All rights reserved.
15
<prepared>Robert Virding</prepared>
16
<responsible>nobody</responsible>
18
<approved>nobody</approved>
20
<date>2009-05-07</date>
25
<modulesummary>Lexical analyzer generator for Erlang</modulesummary>
27
<p>A regular expression based lexical analyzer generator for
28
Erlang, similar to lex or flex.</p>
29
<note><p>The Leex module should be considered experimental
30
as it will be subject to changes in future releases.</p></note>
33
<title>DATA TYPES</title>
35
ErrorInfo = {ErrorLine,module(),error_descriptor()}
37
Token = tuple()</code>
41
<name>file(FileName) -> ok | error</name>
42
<name>file(FileName, Options) -> ok | error</name>
43
<fsummary>Generate a lexical analyzer</fsummary>
45
<v>FileName = filename()</v>
46
<v>Options = Option | [Option]</v>
47
<v>Option = - see below -</v>
48
<v>FileReturn = {ok, Scannerfile}
49
| {ok, Scannerfile, Warnings}
51
| {error, Warnings, Errors}</v>
52
<v>Scannerfile = filename()</v>
53
<v>Warnings = Errors = [{filename(), [ErrorInfo]}]</v>
56
<p>Generates a lexical analyzer from the definition in the input
57
file. The input file has the extension <c>.xrl</c>. This is
58
added to the filename if it is not given. The resulting module
59
is the Xrl filename without the <c>.xrl</c> extension.</p>
61
<p>The current options are:</p>
63
<tag><c>dfa_graph</c></tag>
64
<item><p>Generates a <c>.dot</c> file which contains a
65
description of the DFA in a format which can be viewed with
66
Graphviz, <c>www.graphviz.com</c>.</p>
68
<tag><c>{includefile,Includefile}</c></tag>
69
<item><p>Uses a specific or customised prologue file
71
<c>lib/parsetools/include/leexinc.hrl</c> which is
72
otherwise included.</p>
74
<tag><c>{report_errors, bool()}</c></tag>
75
<item><p>Causes errors to be printed as they occur. Default is
78
<tag><c>{report_warnings, bool()}</c></tag>
79
<item><p>Causes warnings to be printed as they occur. Default is
82
<tag><c>{report, bool()}</c></tag>
83
<item><p>This is a short form for both <c>report_errors</c> and
84
<c>report_warnings</c>.</p>
86
<tag><c>{return_errors, bool()}</c></tag>
87
<item><p>If this flag is set, <c>{error, Errors, Warnings}</c>
88
is returned when there are errors. Default is <c>false</c>.</p>
90
<tag><c>{return_warnings, bool()}</c></tag>
91
<item><p>If this flag is set, an extra field containing
92
<c>Warnings</c> is added to the tuple returned upon
93
success. Default is <c>false</c>.</p>
95
<tag><c>{return, bool()}</c></tag>
96
<item><p>This is a short form for both <c>return_errors</c> and
97
<c>return_warnings</c>.</p>
99
<tag><c>{scannerfile, Scannerfile}</c></tag>
100
<item><p><c>Scannerfile</c> is the name of the file that
101
will contain the Erlang scanner code that is generated.
102
The default (<c>""</c>) is to add the extension
103
<c>.erl</c> to <c>FileName</c> stripped of the
104
<c>.xrl</c> extension.</p>
106
<tag><c>{verbose, bool()}</c></tag>
107
<item><p>Outputs information from parsing the input file and
108
generating the internal tables.</p>
111
<p>Any of the Boolean options can be set to <c>true</c> by
112
stating the name of the option. For example, <c>verbose</c>
113
is equivalent to <c>{verbose, true}</c>.</p>
114
<p>Leex will add the extension <c>.hrl</c> to the
115
<c>Includefile</c> name and the extension <c>.erl</c> to the
116
<c>Scannerfile</c> name, unless the extension is already
121
<name>format_error(ErrorInfo) -> Chars</name>
122
<fsummary>Return an English description of a an error tuple.</fsummary>
124
<v>Chars = [char() | Chars]</v>
127
<p>Returns a string which describes the error
128
<c>ErrorInfo</c> returned when there is an error in a
129
regular expression.</p>
136
<title>GENERATED SCANNER EXPORTS</title>
137
<p>The following functions are exported by the generated scanner.</p>
142
<name>string(String) -> StringRet</name>
143
<name>string(String, StartLine) -> StringRet</name>
144
<fsummary>Generated by Leex</fsummary>
146
<v>String = string()</v>
147
<v>StringRet = {ok,Tokens,EndLine} | ErrorInfo</v>
148
<v>Tokens = [Token]</v>
149
<v>EndLine = StartLine = integer()</v>
152
<p>Scans <c>String</c> and returns all the tokens in it, or an
154
<note><p>It is an error if not all of the characters in
155
<c>String</c> are consumed.</p></note>
160
<name>token(Cont, Chars) -> {more,Cont1} | {done,TokenRet,RestChars}
162
<name>token(Cont, Chars, StartLine) -> {more,Cont1}
163
| {done,TokenRet,RestChars}
165
<fsummary>Generated by Leex</fsummary>
167
<v>Cont = [] | Cont1</v>
168
<v>Cont1 = tuple()</v>
169
<v>Chars = RestChars = string() | eof</v>
170
<v>TokenRet = {ok, Token, EndLine}
173
<v>StartLine = EndLine = integer()</v>
176
<p>This is a re-entrant call to try and scan one token from
177
<c>Chars</c>. If there are enough characters in <c>Chars</c>
178
to either scan a token or detect an error then this will be
179
returned with <c>{done,...}</c>. Otherwise
180
<c>{cont,Cont}</c> will be returned where <c>Cont</c> is
181
used in the next call to <c>token()</c> with more characters
182
to try an scan the token. This is continued until a token
183
has been scanned. <c>Cont</c> is initially <c>[]</c>.</p>
185
<p>It is not designed to be called directly by an application
186
but used through the i/o system where it can typically be
187
called in an application by:</p>
189
io:request(InFile, {get_until,Prompt,Module,token,[Line]})
195
<name>tokens(Cont, Chars) -> {more,Cont1} | {done,TokensRet,RestChars}
197
<name>tokens(Cont, Chars, StartLine) ->
198
{more,Cont1} | {done,TokensRet,RestChars}
200
<fsummary>Generated by Leex</fsummary>
202
<v>Cont = [] | Cont1</v>
203
<v>Cont1 = tuple()</v>
204
<v>Chars = RestChars = string() | eof</v>
205
<v>TokensRet = {ok, Tokens, EndLine}
208
<v>Tokens = [Token]</v>
209
<v>StartLine = EndLine = integer()</v>
212
<p>This is a re-entrant call to try and scan tokens from
213
<c>Chars</c>. If there are enough characters in <c>Chars</c>
214
to either scan tokens or detect an error then this will be
215
returned with <c>{done,...}</c>. Otherwise
216
<c>{cont,Cont}</c> will be returned where <c>Cont</c> is
217
used in the next call to <c>tokens()</c> with more
218
characters to try an scan the tokens. This is continued
219
until all tokens have been scanned. <c>Cont</c> is initially
222
<p>This functions differs from <c>token</c> in that it will
223
continue to scan tokens upto and including an
224
<c>{end_token,Token}</c> has been scanned (see next
225
section). It will then return all the tokens. This is
226
typically used for scanning grammars like Erlang where there
227
is an explicit end token, <c>'.'</c>. If no end token is
228
found then the whole file will be scanned and returned. If
229
an error occurs then all tokens upto and including the next
230
end token will be skipped.</p>
232
<p>It is not designed to be called directly by an application
233
but used through the i/o system where it can typically be
234
called in an application by:</p>
236
io:request(InFile, {get_until,Prompt,Module,tokens,[Line]})
243
<title>Input File Format</title>
244
<p>Erlang style comments starting with a <c>%</c> are allowed in
245
scanner files. A definition file has the following format:</p>
251
<Macro Definitions>
259
<Erlang Code></code>
261
<p>The "Definitions.", "Rules." and "Erlang Code." headings are
262
mandatory and must occur at the beginning of a source line. The
263
<Header>, <Macro Definitions> and <Erlang Code>
264
sections may be empty but there must be at least one rule.</p>
266
<p>Macro definitions have the following format:</p>
271
<p>and there must be spaces around <c>=</c>. Macros can be used in
272
the regular expressions of rules by writing <c>{NAME}</c>.</p>
274
<note><p>When macros are expanded in expressions the macro calls
275
are replaced by the macro value without any form of quoting or
276
enclosing in parentheses.</p></note>
278
<p>Rules have the following format:</p>
281
<Regexp> : <Erlang code>.</code>
283
<p>The <Regexp> must occur at the start of a line and not
284
include any blanks; use <c>\\t</c> and <c>\\s</c> to include TAB
285
and SPACE characters in the regular expression. If <Regexp>
286
matches then the corresponding <Erlang code> is evaluated to
287
generate a token. With the Erlang code the following predefined
288
variables are available:</p>
291
<tag><c>TokenChars</c></tag>
292
<item><p>A list of the characters in the matched token.</p>
294
<tag><c>TokenLen</c></tag>
295
<item><p>The number of characters in the matched token.</p>
297
<tag><c>TokenLine</c></tag>
298
<item><p>The line number where the token occurred.</p>
302
<p>The code must return:</p>
305
<tag><c>{token,Token}</c></tag>
306
<item><p>Return <c>Token</c> to the caller.</p>
308
<tag><c>{end_token,Token}</c></tag>
309
<item><p>Return <c>Token</c> and is last token in a tokens call.</p>
311
<tag><c>skip_token</c></tag>
312
<item><p>Skip this token completely.</p>
314
<tag><c>{error,ErrString}</c></tag>
315
<item><p>An error in the token, <c>ErrString</c> is a string
316
describing the error.</p>
320
<p>It is also possible to push back characters into the input
321
characters with the following returns:</p>
324
<item><c>{token,Token,PushBackList}</c></item>
325
<item><c>{end_token,Token,PushBackList}</c></item>
326
<item><c>{skip_token,PushBackList}</c></item>
329
<p>These have the same meanings as the normal returns but the
330
characters in <c>PushBackList</c> will be prepended to the input
331
characters and scanned for the next token. Note that pushing
332
back a newline will mean the line numbering will no longer be
335
<note><p>Pushing back characters gives you unexpected
336
possibilities to cause the scanner to loop!</p></note>
338
<p>The following example would match a simple Erlang integer or
339
float and return a token which could be sent to the Erlang
345
{token,{integer,TokenLine,list_to_integer(TokenChars)}}.
347
{D}+\\.{D}+((E|e)(\\+|\\-)?{D}+)? :
348
{token,{float,TokenLine,list_to_float(TokenChars)}}.</code>
350
<p>The Erlang code in the "Erlang Code." section is written into
351
the output file directly after the module declaration and
352
predefined exports declaration so it is possible to add extra
353
exports, define imports and other attributes which are then
354
visible in the whole file.</p>
358
<title>Regular Expressions</title>
360
<p>The regular expressions allowed here is a subset of the set
361
found in <c>egrep</c> and in the AWK programming language, as
362
defined in the book, The AWK Programming Language, by A. V. Aho,
363
B. W. Kernighan, P. J. Weinberger. They are composed of the
364
following characters:</p>
368
<item><p>Matches the non-metacharacter c.</p>
370
<tag><c>\\c</c></tag>
371
<item><p>Matches the escape sequence or literal character c.</p>
374
<item><p>Matches any character.</p>
377
<item><p>Matches the beginning of a string.</p>
380
<item><p>Matches the end of a string.</p></item>
381
<tag><c>[abc...]</c></tag>
382
<item><p>Character class, which matches any of the characters
383
<c>abc...</c>. Character ranges are specified by a pair of
384
characters separated by a <c>-</c>.</p>
386
<tag><c>[^abc...]</c></tag>
387
<item><p>Negated character class, which matches any character
388
except <c>abc...</c>.</p>
390
<tag><c>r1 | r2</c></tag>
391
<item><p>Alternation. It matches either <c>r1</c> or <c>r2</c>.</p>
393
<tag><c>r1r2</c></tag>
394
<item><p>Concatenation. It matches <c>r1</c> and then <c>r2</c>.</p>
397
<item><p>Matches one or more <c>rs</c>.</p>
400
<item><p>Matches zero or more <c>rs</c>.</p>
403
<item><p>Matches zero or one <c>rs</c>.</p>
405
<tag><c>(r)</c></tag>
406
<item><p>Grouping. It matches <c>r</c>.</p>
410
<p>The escape sequences allowed are the same as for Erlang strings:</p>
413
<tag><c>\\b</c></tag>
414
<item><p>Backspace.</p></item>
415
<tag><c>\\f</c></tag>
416
<item><p>Form feed.</p></item>
417
<tag><c>\\n</c></tag>
418
<item><p>Newline (line feed).</p></item>
419
<tag><c>\\r</c></tag>
420
<item><p>Carriage return.</p></item>
421
<tag><c>\\t</c></tag>
422
<item><p>Tab.</p></item>
423
<tag><c>\\e</c></tag>
424
<item><p>Escape.</p></item>
425
<tag><c>\\v</c></tag>
426
<item><p>Vertical tab.</p></item>
427
<tag><c>\\s</c></tag>
428
<item><p>Space.</p></item>
429
<tag><c>\\d</c></tag>
430
<item><p>Delete.</p></item>
431
<tag><c>\\ddd</c></tag>
432
<item><p>The octal value <c>ddd</c>.</p></item>
433
<tag><c>\\xhh</c></tag>
434
<item><p>The hexadecimal value <c>hh</c>.</p></item>
435
<tag><c>\\x{h...}</c></tag>
436
<item><p>The hexadecimal value <c>h...</c>.</p></item>
437
<tag><c>\\c</c></tag>
438
<item><p>Any other character literally, for example <c>\\\\</c> for
439
backslash, <c>\\"</c> for <c>"</c>.</p>
443
<p>The following examples define Erlang data types:</p>
445
Atoms [a-z][0-9a-zA-Z_]*
447
Variables [A-Z_][0-9a-zA-Z_]*
449
Floats (\\+|-)?[0-9]+\\.[0-9]+((E|e)(\\+|-)?[0-9]+)?</code>
451
<note><p>Anchoring a regular expression with <c>^</c> and <c>$</c>
452
is not implemented in the current version of Leex and just
453
generates an error.</p></note>