8
6
<META http-equiv="Content-Type" content="text/html; charset=ISO-8859-1">
9
<META name="GENERATOR" content="hevea 1.08">
7
<META name="GENERATOR" content="hevea 1.09">
10
8
<LINK rel="stylesheet" type="text/css" href="manual.css">
12
Lexer and parser generators (ocamllex, ocamlyacc)
9
<TITLE>Lexer and parser generators (ocamllex, ocamlyacc)</TITLE>
16
<A HREF="manual025.html"><IMG SRC ="previous_motif.gif" ALT="Previous"></A>
17
<A HREF="index.html"><IMG SRC ="contents_motif.gif" ALT="Up"></A>
18
<A HREF="manual027.html"><IMG SRC ="next_motif.gif" ALT="Next"></A>
12
<A HREF="manual025.html"><IMG SRC="previous_motif.gif" ALT="Previous"></A>
13
<A HREF="index.html"><IMG SRC="contents_motif.gif" ALT="Up"></A>
14
<A HREF="manual027.html"><IMG SRC="next_motif.gif" ALT="Next"></A>
21
<H1 CLASS="chapter"><A NAME="htoc126">Chapter 12</A> Lexer and parser generators (ocamllex, ocamlyacc)</H1>
16
<H1 CLASS="chapter"><A NAME="htoc126">Chapter�12</A>��Lexer and parser generators (ocamllex, ocamlyacc)</H1><P>
22
17
<A NAME="c:ocamlyacc"></A>
24
This chapter describes two program generators: <TT>ocamllex</TT>, that
18
</P><P>This chapter describes two program generators: <TT>ocamllex</TT>, that
25
19
produces a lexical analyzer from a set of regular expressions with
26
20
associated semantic actions, and <TT>ocamlyacc</TT>, that produces a parser
27
from a grammar with associated semantic actions.<BR>
29
These program generators are very close to the well-known <TT>lex</TT> and
21
from a grammar with associated semantic actions.</P><P>These program generators are very close to the well-known <TT>lex</TT> and
30
22
<TT>yacc</TT> commands that can be found in most C programming environments.
31
23
This chapter assumes a working knowledge of <TT>lex</TT> and <TT>yacc</TT>: while
32
24
it describes the input syntax for <TT>ocamllex</TT> and <TT>ocamlyacc</TT> and the
33
25
main differences with <TT>lex</TT> and <TT>yacc</TT>, it does not explain the basics
34
26
of writing a lexer or parser description in <TT>lex</TT> and <TT>yacc</TT>. Readers
35
unfamiliar with <TT>lex</TT> and <TT>yacc</TT> are referred to “Compilers:
36
principles, techniques, and tools” by Aho, Sethi and Ullman
37
(Addison-Wesley, 1986), or “Lex & Yacc”, by Levine, Mason and
38
Brown (O'Reilly, 1992).<BR>
41
<H2 CLASS="section"><A NAME="htoc127">12.1</A> Overview of <TT>ocamllex</TT></H2>
42
The <TT>ocamllex</TT> command produces a lexical analyzer from a set of regular
27
unfamiliar with <TT>lex</TT> and <TT>yacc</TT> are referred to “Compilers:
28
principles, techniques, and tools” by Aho, Sethi and Ullman
29
(Addison-Wesley, 1986), or “Lex & Yacc”, by Levine, Mason and
30
Brown (O'Reilly, 1992).</P><H2 CLASS="section"><A NAME="htoc127">12.1</A>��Overview of <TT>ocamllex</TT></H2><P>The <TT>ocamllex</TT> command produces a lexical analyzer from a set of regular
43
31
expressions with attached semantic actions, in the style of
44
32
<TT>lex</TT>. Assuming the input file is <I>lexer</I><TT>.mll</TT>, executing
46
34
ocamllex <I>lexer</I>.mll
48
36
produces Caml code for a lexical analyzer in file <I>lexer</I><TT>.ml</TT>.
49
37
This file defines one lexing function per entry point in the lexer
50
38
definition. These functions have the same names as the entry
51
39
points. Lexing functions take as argument a lexer buffer, and return
52
the semantic attribute of the corresponding entry point.<BR>
54
Lexer buffers are an abstract data type implemented in the standard
40
the semantic attribute of the corresponding entry point.</P><P>Lexer buffers are an abstract data type implemented in the standard
55
41
library module <TT>Lexing</TT>. The functions <TT>Lexing.from_channel</TT>,
56
42
<TT>Lexing.from_string</TT> and <TT>Lexing.from_function</TT> create
57
43
lexer buffers that read from an input channel, a character string, or
58
44
any reading function, respectively. (See the description of module
59
<TT>Lexing</TT> in chapter <A HREF="manual034.html#c:stdlib">20</A>.)<BR>
61
When used in conjunction with a parser generated by <TT>ocamlyacc</TT>, the
45
<TT>Lexing</TT> in chapter�<A HREF="manual034.html#c:stdlib">20</A>.)</P><P>When used in conjunction with a parser generated by <TT>ocamlyacc</TT>, the
62
46
semantic actions compute a value belonging to the type <TT>token</TT> defined
63
47
by the generated parsing module. (See the description of <TT>ocamlyacc</TT>
67
<H3 CLASS="subsection"><A NAME="htoc128">12.1.1</A> Options</H3>
68
The following command-line options are recognized by <TT>ocamllex</TT>.
69
<DL CLASS="description" COMPACT=compact><DT CLASS="dt-description">
70
<B><TT>-o</TT> <I>output-file</I></B><DD CLASS="dd-description">
48
below.)</P><H3 CLASS="subsection"><A NAME="htoc128">12.1.1</A>��Options</H3><P>
49
The following command-line options are recognized by <TT>ocamllex</TT>.</P><DL CLASS="description"><DT CLASS="dt-description">
50
<B><TT>-o</TT> <I>output-file</I></B></DT><DD CLASS="dd-description">
71
51
Specify the name of the output file produced by <TT>ocamllex</TT>.
72
52
Default is <I>lexer</I><TT>.ml</TT>, <TT>ocamllex</TT> being invoked as
73
<TT>ocamllex</TT> <I>lexer</I><TT>.mll</TT>.<BR>
75
<DT CLASS="dt-description"><TT><B>-ml</B></TT><DD CLASS="dd-description">
53
<TT>ocamllex</TT> <I>lexer</I><TT>.mll</TT>.</DD><DT CLASS="dt-description"><TT><B>-ml</B></TT></DT><DD CLASS="dd-description">
76
54
Output code that does not use the Caml built-in automata
77
55
interpreter. Instead, the automaton is encoded by Caml functions.
78
56
This option is useful for debugging <TT>ocamllex</TT>, using it for
79
production lexers is not recommended.<BR>
81
<DT CLASS="dt-description"><TT><B>-q</B></TT><DD CLASS="dd-description">
57
production lexers is not recommended.</DD><DT CLASS="dt-description"><TT><B>-q</B></TT></DT><DD CLASS="dd-description">
82
58
Quiet mode. <TT>ocamllex</TT> normally outputs informational messages
83
to standard output. They are suppressed if option <TT>-q</TT> is used.<BR>
85
<DT CLASS="dt-description" style="background-color:yellow; color:red"><TT><B>-version</B></TT><DD CLASS="dd-description" style="background-color:yellow; color:red">
59
to standard output. They are suppressed if option <TT>-q</TT> is used.</DD><DT CLASS="dt-description"><TT><B>-version</B></TT></DT><DD CLASS="dd-description">
86
60
Print version and exit.
89
<H2 CLASS="section"><A NAME="htoc129">12.2</A> Syntax of lexer definitions</H2>
90
The format of lexer definitions is as follows:
61
</DD></DL><H2 CLASS="section"><A NAME="htoc129">12.2</A>��Syntax of lexer definitions</H2><P>The format of lexer definitions is as follows:
94
let <I>ident</I> = <I>regexp</I> ...
95
rule <I>entrypoint</I> [<I>arg<SUB>1</SUB></I>... <I>arg<SUB>n</SUB></I>] =
65
let <I>ident</I> = <I>regexp</I> …
66
rule <I>entrypoint</I> [<I>arg<SUB>1</SUB></I>… <I>arg<SUB>n</SUB></I>] =
96
67
parse <I>regexp</I> { <I>action</I> }
98
69
| <I>regexp</I> { <I>action</I> }
99
and <I>entrypoint</I> [<I>arg<SUB>1</SUB></I>... <I>arg<SUB>n</SUB></I>] =
70
and <I>entrypoint</I> [<I>arg<SUB>1</SUB></I>… <I>arg<SUB>n</SUB></I>] =
102
73
{ <I>trailer</I> }
104
75
Comments are delimited by <TT>(*</TT> and <TT>*)</TT>, as in Caml.
105
76
The <TT>parse</TT> keyword, can be replaced by the <TT>shortest</TT> keyword, with
106
the semantic consequences explained below.<BR>
109
<H3 CLASS="subsection"><A NAME="htoc130">12.2.1</A> Header and trailer</H3>
77
the semantic consequences explained below.</P><H3 CLASS="subsection"><A NAME="htoc130">12.2.1</A>��Header and trailer</H3><P>
110
78
The <I>header</I> and <I>trailer</I> sections are arbitrary Caml
111
79
text enclosed in curly braces. Either or both can be omitted. If
112
80
present, the header text is copied as is at the beginning of the
113
81
output file and the trailer text at the end. Typically, the
114
82
header section contains the <CODE>open</CODE> directives required
115
83
by the actions, and possibly some auxiliary functions used in the
119
<H3 CLASS="subsection"><A NAME="htoc131">12.2.2</A> Naming regular expressions</H3>
120
Between the header and the entry points, one can give names to
84
actions.</P><H3 CLASS="subsection"><A NAME="htoc131">12.2.2</A>��Naming regular expressions</H3><P>Between the header and the entry points, one can give names to
121
85
frequently-occurring regular expressions. This is written
122
<FONT COLOR=blue><TT>let</TT></FONT> <FONT COLOR=maroon><TT><I>ident</I></TT> <FONT COLOR=blue><TT>=</TT></FONT> <TT><I>regexp</I></TT></FONT>.
86
<FONT COLOR=blue><TT>let</TT></FONT> <I><A HREF="lex.html#ident"><FONT COLOR=maroon>ident</FONT></A></I> <FONT COLOR=blue><TT>=</TT></FONT> �<I><A HREF="#regexp"><FONT COLOR=maroon>regexp</FONT></A></I>.
123
87
In regular expressions that follow this declaration, the identifier
124
<I>ident</I> can be used as shorthand for <I>regexp</I>.<BR>
127
<H3 CLASS="subsection"><A NAME="htoc132">12.2.3</A> Entry points</H3>
128
The names of the entry points must be valid identifiers for Caml
88
<I>ident</I> can be used as shorthand for <I>regexp</I>.</P><H3 CLASS="subsection"><A NAME="htoc132">12.2.3</A>��Entry points</H3><P>The names of the entry points must be valid identifiers for Caml
129
89
values (starting with a lowercase letter).
130
Similarily, the arguments <TT><I>arg<SUB>1</SUB></I>...
90
Similarily, the arguments <TT><I>arg<SUB>1</SUB></I>…
131
91
<I>arg<SUB>n</SUB></I></TT> must be valid identifiers for Caml.
132
92
Each entry point becomes a
133
93
Caml function that takes <I>n</I>+1 arguments,
135
95
Characters are read from the <TT>Lexing.lexbuf</TT> argument and matched
136
96
against the regular expressions provided in the rule, until a prefix
137
97
of the input matches one of the rule. The corresponding action is
138
then evaluated and returned as the result of the function.<BR>
140
If several regular expressions match a prefix of the input, the
141
“longest match” rule applies: the regular expression that matches
98
then evaluated and returned as the result of the function.</P><P>If several regular expressions match a prefix of the input, the
99
“longest match” rule applies: the regular expression that matches
142
100
the longest prefix of the input is selected. In case of tie, the
143
regular expression that occurs earlier in the rule is selected.<BR>
145
However, if lexer rules are introduced with the <TT>shortest</TT> keyword in
146
place of the <TT>parse</TT> keyword, then the “shortest match” rule applies:
101
regular expression that occurs earlier in the rule is selected.</P><P>However, if lexer rules are introduced with the <TT>shortest</TT> keyword in
102
place of the <TT>parse</TT> keyword, then the “shortest match” rule applies:
147
103
the shortest prefix of the input is selected. In case of tie, the
148
104
regular expression that occurs earlier in the rule is still selected.
149
105
This feature is not intended for use in ordinary lexical analyzers, it
150
may facilitate the use of <TT>ocamllex</TT> as a simple text processing tool.<BR>
153
<H3 CLASS="subsection"><A NAME="htoc133">12.2.4</A> Regular expressions</H3>
154
The regular expressions are in the style of <TT>lex</TT>, with a more
106
may facilitate the use of <TT>ocamllex</TT> as a simple text processing tool.</P><H3 CLASS="subsection"><A NAME="htoc133">12.2.4</A>��Regular expressions</H3><P>The regular expressions are in the style of <TT>lex</TT>, with a more
155
107
Caml-like syntax.
156
<DL CLASS="description" COMPACT=compact><DT CLASS="dt-description"><FONT COLOR=blue><B><TT>'</TT> <FONT COLOR=maroon><TT><I>char</I></TT></FONT> <TT>'</TT></B></FONT><DD CLASS="dd-description">
108
</P><TABLE CLASS="display dcenter"><TR VALIGN="middle"><TD CLASS="dcell"><TABLE CELLSPACING=6 CELLPADDING=0><TR><TD ALIGN=right NOWRAP>
109
<I><A NAME="regexp"><FONT COLOR=maroon>regexp</FONT></A></I></TD><TD ALIGN=center NOWRAP>::=</TD><TD ALIGN=left NOWRAP>
112
</TABLE><DL CLASS="description"><DT CLASS="dt-description"><B><FONT COLOR=blue><TT>'</TT></FONT> <FONT COLOR=maroon><I>regular-char</I></FONT> ∣ <I><A HREF="lex.html#escape-sequence"><FONT COLOR=maroon>escape-sequence</FONT></A></I> <FONT COLOR=blue><TT>'</TT></FONT></B></DT><DD CLASS="dd-description">
157
113
A character constant, with the same syntax as Objective Caml character
158
constants. Match the denoted character.<BR>
160
<DT CLASS="dt-description"><TT><B>_</B></TT><DD CLASS="dd-description">
161
(Underscore.) Match any character.<BR>
163
<DT CLASS="dt-description"><TT><B>eof</B></TT><DD CLASS="dd-description">
114
constants. Match the denoted character.</DD><DT CLASS="dt-description"><TT><B>_</B></TT></DT><DD CLASS="dd-description">
115
(Underscore.) Match any character.</DD><DT CLASS="dt-description"><FONT COLOR=blue><TT><B>eof</B></TT></FONT></DT><DD CLASS="dd-description">
164
116
Match the end of the lexer input.<BR>
165
117
<B>Note:</B> On some systems, with interactive input, an end-of-file
166
118
may be followed by more characters. However, <TT>ocamllex</TT> will not
167
119
correctly handle regular expressions that contain <TT>eof</TT> followed by
170
<DT CLASS="dt-description"><FONT COLOR=blue><B><TT>"</TT> <FONT COLOR=maroon><TT><I>string</I></TT></FONT> <TT>"</TT></B></FONT><DD CLASS="dd-description">
120
something else.</DD><DT CLASS="dt-description"><B><FONT COLOR=blue><TT>"</TT></FONT> { <I><A HREF="lex.html#string-character"><FONT COLOR=maroon>string-character</FONT></A></I> } <FONT COLOR=blue><TT>"</TT></FONT></B></DT><DD CLASS="dd-description">
171
121
A string constant, with the same syntax as Objective Caml string
172
constants. Match the corresponding sequence of characters.<BR>
174
<DT CLASS="dt-description"><FONT COLOR=blue><B><TT>[</TT> <FONT COLOR=maroon><TT><I>character-set</I></TT></FONT> <TT>]</TT></B></FONT><DD CLASS="dd-description">
122
constants. Match the corresponding sequence of characters.</DD><DT CLASS="dt-description"><FONT COLOR=blue><B><TT>[</TT> <FONT COLOR=maroon><I>character-set</I></FONT> <TT>]</TT></B></FONT></DT><DD CLASS="dd-description">
175
123
Match any single character belonging to the given
176
124
character set. Valid character sets are: single
177
character constants <FONT COLOR=blue><TT>'</TT> <FONT COLOR=maroon><TT><I>c</I></TT></FONT> <TT>'</TT></FONT>; ranges of characters
178
<FONT COLOR=blue><TT>'</TT></FONT> <FONT COLOR=maroon><I><TT>c</TT></I></FONT><SUB>1</SUB> <FONT COLOR=blue><TT>'</TT> <TT>-</TT> <TT>'</TT></FONT> <FONT COLOR=maroon><I><TT>c</TT></I></FONT><SUB>2</SUB> <FONT COLOR=blue><TT>'</TT></FONT> (all characters between <I>c</I><SUB>1</SUB> and <I>c</I><SUB>2</SUB>,
125
character constants <FONT COLOR=blue><TT>'</TT> <FONT COLOR=maroon><I>c</I></FONT> <TT>'</TT></FONT>; ranges of characters
126
<FONT COLOR=blue><TT>'</TT></FONT> <FONT COLOR=maroon><I>c</I></FONT><SUB>1</SUB> <FONT COLOR=blue><TT>'</TT> <TT>-</TT> <TT>'</TT></FONT> <FONT COLOR=maroon><I>c</I></FONT><SUB>2</SUB> <FONT COLOR=blue><TT>'</TT></FONT> (all characters between <I>c</I><SUB>1</SUB> and <I>c</I><SUB>2</SUB>,
179
127
inclusive); and the union of two or more character sets, denoted by
182
<DT CLASS="dt-description"><FONT COLOR=blue><B><TT>[</TT> <TT>^</TT> <FONT COLOR=maroon><TT><I>character-set</I></TT></FONT> <TT>]</TT></B></FONT><DD CLASS="dd-description">
183
Match any single character not belonging to the given character set.<BR>
185
<DT CLASS="dt-description"><B><FONT COLOR=maroon><TT><I>regexp</I></TT></FONT><SUB>1</SUB> <FONT COLOR=blue><TT>#</TT></FONT> <FONT COLOR=maroon><TT><I>regexp</I></TT></FONT><SUB>2</SUB></B><DD CLASS="dd-description">
128
concatenation.</DD><DT CLASS="dt-description"><FONT COLOR=blue><B><TT>[</TT> <TT>^</TT> <FONT COLOR=maroon><I>character-set</I></FONT> <TT>]</TT></B></FONT></DT><DD CLASS="dd-description">
129
Match any single character not belonging to the given character set.</DD><DT CLASS="dt-description"><B><I><A HREF="#regexp"><FONT COLOR=maroon>regexp</FONT></A></I><SUB>1</SUB> <FONT COLOR=blue><TT>#</TT></FONT> �<I><A HREF="#regexp"><FONT COLOR=maroon>regexp</FONT></A></I><SUB>2</SUB></B></DT><DD CLASS="dd-description">
186
130
(Difference of character sets).
187
Regular expressions <FONT COLOR=maroon><I><TT>regexp</TT></I></FONT><SUB>1</SUB> and <FONT COLOR=maroon><I><TT>regexp</TT></I></FONT><SUB>2</SUB> must be character sets
188
defined with <FONT COLOR=blue><TT>[</TT></FONT>… <FONT COLOR=blue><TT>]</TT></FONT> (or a a single character expression or
131
Regular expressions <I><A HREF="#regexp"><FONT COLOR=maroon>regexp</FONT></A></I><SUB>1</SUB> and <I><A HREF="#regexp"><FONT COLOR=maroon>regexp</FONT></A></I><SUB>2</SUB> must be character sets
132
defined with <FONT COLOR=blue><TT>[</TT></FONT>… <FONT COLOR=blue><TT>]</TT></FONT> (or a a single character expression or
189
133
underscore <TT>_</TT>).
190
Match the difference of the two specified character sets.<BR>
192
<DT CLASS="dt-description"><B><FONT COLOR=maroon><TT><I>regexp</I></TT></FONT> <FONT COLOR=blue><TT>*</TT></FONT></B><DD CLASS="dd-description">
134
Match the difference of the two specified character sets.</DD><DT CLASS="dt-description"><B><I><A HREF="#regexp"><FONT COLOR=maroon>regexp</FONT></A></I> <FONT COLOR=blue><TT>*</TT></FONT></B></DT><DD CLASS="dd-description">
193
135
(Repetition.) Match the concatenation of zero or more
194
strings that match <FONT COLOR=maroon><I><TT>regexp</TT></I></FONT>. <BR>
196
<DT CLASS="dt-description"><B><FONT COLOR=maroon><TT><I>regexp</I></TT></FONT> <FONT COLOR=blue><TT>+</TT></FONT></B><DD CLASS="dd-description">
136
strings that match <I><A HREF="#regexp"><FONT COLOR=maroon>regexp</FONT></A></I>. </DD><DT CLASS="dt-description"><B><I><A HREF="#regexp"><FONT COLOR=maroon>regexp</FONT></A></I> <FONT COLOR=blue><TT>+</TT></FONT></B></DT><DD CLASS="dd-description">
197
137
(Strict repetition.) Match the concatenation of one or more
198
strings that match <FONT COLOR=maroon><I><TT>regexp</TT></I></FONT>.<BR>
200
<DT CLASS="dt-description"><B><FONT COLOR=maroon><TT><I>regexp</I></TT></FONT> <FONT COLOR=blue><TT>?</TT></FONT></B><DD CLASS="dd-description">
201
(Option.) Match either the empty string, or a string matching <FONT COLOR=maroon><I><TT>regexp</TT></I></FONT>.<BR>
203
<DT CLASS="dt-description"><B><FONT COLOR=maroon><TT><I>regexp</I></TT></FONT><SUB>1</SUB> <FONT COLOR=blue><TT>|</TT></FONT> <FONT COLOR=maroon><TT><I>regexp</I></TT></FONT><SUB>2</SUB></B><DD CLASS="dd-description">
204
(Alternative.) Match any string that matches either <FONT COLOR=maroon><I><TT>regexp</TT></I></FONT><SUB>1</SUB> or <FONT COLOR=maroon><I><TT>regexp</TT></I></FONT><SUB>2</SUB><BR>
206
<DT CLASS="dt-description"><B><FONT COLOR=maroon><TT><I>regexp</I></TT></FONT><SUB>1</SUB> <FONT COLOR=maroon><TT><I>regexp</I></TT></FONT><SUB>2</SUB></B><DD CLASS="dd-description">
138
strings that match <I><A HREF="#regexp"><FONT COLOR=maroon>regexp</FONT></A></I>.</DD><DT CLASS="dt-description"><B><I><A HREF="#regexp"><FONT COLOR=maroon>regexp</FONT></A></I> <FONT COLOR=blue><TT>?</TT></FONT></B></DT><DD CLASS="dd-description">
139
(Option.) Match either the empty string, or a string matching <I><A HREF="#regexp"><FONT COLOR=maroon>regexp</FONT></A></I>.</DD><DT CLASS="dt-description"><B><I><A HREF="#regexp"><FONT COLOR=maroon>regexp</FONT></A></I><SUB>1</SUB> <FONT COLOR=blue><TT>|</TT></FONT> �<I><A HREF="#regexp"><FONT COLOR=maroon>regexp</FONT></A></I><SUB>2</SUB></B></DT><DD CLASS="dd-description">
140
(Alternative.) Match any string that matches either <I><A HREF="#regexp"><FONT COLOR=maroon>regexp</FONT></A></I><SUB>1</SUB> or <I><A HREF="#regexp"><FONT COLOR=maroon>regexp</FONT></A></I><SUB>2</SUB></DD><DT CLASS="dt-description"><B><I><A HREF="#regexp"><FONT COLOR=maroon>regexp</FONT></A></I><SUB>1</SUB> �<I><A HREF="#regexp"><FONT COLOR=maroon>regexp</FONT></A></I><SUB>2</SUB></B></DT><DD CLASS="dd-description">
207
141
(Concatenation.) Match the concatenation of two strings, the first
208
matching <FONT COLOR=maroon><I><TT>regexp</TT></I></FONT><SUB>1</SUB>, the second matching <FONT COLOR=maroon><I><TT>regexp</TT></I></FONT><SUB>2</SUB>.<BR>
210
<DT CLASS="dt-description"><FONT COLOR=blue><B><TT>(</TT> <FONT COLOR=maroon><TT><I>regexp</I></TT></FONT> <TT>)</TT></B></FONT><DD CLASS="dd-description">
211
Match the same strings as <FONT COLOR=maroon><I><TT>regexp</TT></I></FONT>.<BR>
213
<DT CLASS="dt-description"><FONT COLOR=maroon><I><TT><B>ident</B></TT></I></FONT><DD CLASS="dd-description">
214
Reference the regular expression bound to <FONT COLOR=maroon><I><TT>ident</TT></I></FONT> by an earlier
215
<FONT COLOR=blue><TT>let</TT></FONT> <FONT COLOR=maroon><TT><I>ident</I></TT> <FONT COLOR=blue><TT>=</TT></FONT> <TT><I>regexp</I></TT></FONT> definition.<BR>
217
<DT CLASS="dt-description"><FONT COLOR=maroon><B><TT><I>regexp</I></TT> <FONT COLOR=blue><TT>as</TT></FONT> <TT><I>ident</I></TT></B></FONT><DD CLASS="dd-description">
218
Bind the substring matched by <FONT COLOR=maroon><I><TT>regexp</TT></I></FONT> to identifier <FONT COLOR=maroon><I><TT>ident</TT></I></FONT>.
220
Concerning the precedences of operators, <TT>*</TT> and <TT>+</TT> have
142
matching <I><A HREF="#regexp"><FONT COLOR=maroon>regexp</FONT></A></I><SUB>1</SUB>, the second matching <I><A HREF="#regexp"><FONT COLOR=maroon>regexp</FONT></A></I><SUB>2</SUB>.</DD><DT CLASS="dt-description"><B><FONT COLOR=blue><TT>(</TT></FONT> <I><A HREF="#regexp"><FONT COLOR=maroon>regexp</FONT></A></I> <FONT COLOR=blue><TT>)</TT></FONT></B></DT><DD CLASS="dd-description">
143
Match the same strings as <I><A HREF="#regexp"><FONT COLOR=maroon>regexp</FONT></A></I>.</DD><DT CLASS="dt-description"><I><B><A HREF="lex.html#ident"><FONT COLOR=maroon>ident</FONT></A></B></I></DT><DD CLASS="dd-description">
144
Reference the regular expression bound to <I><A HREF="lex.html#ident"><FONT COLOR=maroon>ident</FONT></A></I> by an earlier
145
<FONT COLOR=blue><TT>let</TT></FONT> <I><A HREF="lex.html#ident"><FONT COLOR=maroon>ident</FONT></A></I> <FONT COLOR=blue><TT>=</TT></FONT> �<I><A HREF="#regexp"><FONT COLOR=maroon>regexp</FONT></A></I> definition.</DD><DT CLASS="dt-description"><B><I><A HREF="#regexp"><FONT COLOR=maroon>regexp</FONT></A></I> <FONT COLOR=blue><TT>as</TT></FONT> �<I><A HREF="lex.html#ident"><FONT COLOR=maroon>ident</FONT></A></I></B></DT><DD CLASS="dd-description">
146
Bind the substring matched by <I><A HREF="#regexp"><FONT COLOR=maroon>regexp</FONT></A></I> to identifier <I><A HREF="lex.html#ident"><FONT COLOR=maroon>ident</FONT></A></I>.
147
</DD></DL><P>Concerning the precedences of operators, <TT>*</TT> and <TT>+</TT> have
221
148
highest precedence, followed by <TT>?</TT>, then concatenation, then
222
<TT>|</TT> (alternation), then <TT>as</TT>.<BR>
225
<H3 CLASS="subsection"><A NAME="htoc134">12.2.5</A> Actions</H3>
226
The actions are arbitrary Caml expressions. They are evaluated in
149
<TT>|</TT> (alternation), then <TT>as</TT>.</P><H3 CLASS="subsection"><A NAME="htoc134">12.2.5</A>��Actions</H3><P>The actions are arbitrary Caml expressions. They are evaluated in
227
150
a context where the identifiers defined by using the <TT>as</TT> construct
228
151
are bound to subparts of the matched string.
229
152
Additionally, <TT>lexbuf</TT> is bound to the current lexer
230
153
buffer. Some typical uses for <TT>lexbuf</TT>, in conjunction with the
231
154
operations on lexer buffers provided by the <TT>Lexing</TT> standard library
232
module, are listed below.
233
<DL CLASS="description" COMPACT=compact><DT CLASS="dt-description">
234
<TT><B>Lexing.lexeme lexbuf</B></TT><DD CLASS="dd-description">
235
Return the matched string.<BR>
237
<DT CLASS="dt-description"><B><TT>Lexing.lexeme_char lexbuf </TT><I>n</I></B><DD CLASS="dd-description">
155
module, are listed below.</P><DL CLASS="description"><DT CLASS="dt-description">
156
<TT><B>Lexing.lexeme lexbuf</B></TT></DT><DD CLASS="dd-description">
157
Return the matched string.</DD><DT CLASS="dt-description"><B><TT>Lexing.lexeme_char lexbuf </TT><I>n</I></B></DT><DD CLASS="dd-description">
238
158
Return the <I>n</I><FONT SIZE=2><SUP>th</SUP></FONT>
239
character in the matched string. The first character corresponds to <I>n</I> = 0.<BR>
241
<DT CLASS="dt-description"><TT><B>Lexing.lexeme_start lexbuf</B></TT><DD CLASS="dd-description">
159
character in the matched string. The first character corresponds to <I>n</I> = 0.</DD><DT CLASS="dt-description"><B><TT>Lexing.lexeme_start lexbuf</TT></B></DT><DD CLASS="dd-description">
242
160
Return the absolute position in the input text of the beginning of the
243
161
matched string. The first character read from the input text has
246
<DT CLASS="dt-description"><TT><B>Lexing.lexeme_end lexbuf</B></TT><DD CLASS="dd-description">
162
position 0.</DD><DT CLASS="dt-description"><B><TT>Lexing.lexeme_end lexbuf</TT></B></DT><DD CLASS="dd-description">
247
163
Return the absolute position in the input text of the end of the
248
164
matched string. The first character read from the input text has
251
<DT CLASS="dt-description"><B><I>entrypoint</I> [<I>exp<SUB>1</SUB></I>... <I>exp<SUB>n</SUB></I>] <TT>lexbuf</TT></B><DD CLASS="dd-description">
165
position 0.</DD><DT CLASS="dt-description"><B><I>entrypoint</I> [<I>exp<SUB>1</SUB></I>… <I>exp<SUB>n</SUB></I>] <TT>lexbuf</TT></B></DT><DD CLASS="dd-description">
252
166
(Where <I>entrypoint</I> is the name of another entry point in the same
253
167
lexer definition.) Recursively call the lexer on the given entry point.
254
168
Notice that <TT>lexbuf</TT> is the last argument.
255
Useful for lexing nested comments, for example.</DL>
257
<H3 CLASS="subsection"><A NAME="htoc135">12.2.6</A> Variables in regular expressions</H3>
258
The <TT>as</TT> construct is similar to “<EM>groups</EM>” as provided by
169
Useful for lexing nested comments, for example.</DD></DL><H3 CLASS="subsection"><A NAME="htoc135">12.2.6</A>��Variables in regular expressions</H3><P>
170
The <TT>as</TT> construct is similar to “<EM>groups</EM>” as provided by
259
171
numerous regular expression packages.
260
172
The type of these variables can be <TT>string</TT>, <TT>char</TT>, <TT>string option</TT>
261
or <TT>char option</TT>.<BR>
263
We first consider the case of linear patterns, that is the case when
173
or <TT>char option</TT>.</P><P>We first consider the case of linear patterns, that is the case when
264
174
all <TT>as</TT> bound variables are distinct.
265
In <FONT COLOR=maroon><TT><I>regexp</I></TT> <FONT COLOR=blue><TT>as</TT></FONT> <TT><I>ident</I></TT></FONT>, the type of <FONT COLOR=maroon><I><TT>ident</TT></I></FONT> normally is <TT>string</TT> (or
175
In <I><A HREF="#regexp"><FONT COLOR=maroon>regexp</FONT></A></I> <FONT COLOR=blue><TT>as</TT></FONT> �<I><A HREF="lex.html#ident"><FONT COLOR=maroon>ident</FONT></A></I>, the type of <I><A HREF="lex.html#ident"><FONT COLOR=maroon>ident</FONT></A></I> normally is <TT>string</TT> (or
266
176
<TT>string option</TT>) except
267
when <FONT COLOR=maroon><I><TT>regexp</TT></I></FONT> is a character constant, an underscore, a string
177
when <I><A HREF="#regexp"><FONT COLOR=maroon>regexp</FONT></A></I> is a character constant, an underscore, a string
268
178
constant of length one, a character set specification, or an
269
alternation of those. Then, the type of <FONT COLOR=maroon><I><TT>ident</TT></I></FONT> is <TT>char</TT> (or <TT>char option</TT>).
179
alternation of those. Then, the type of <I><A HREF="lex.html#ident"><FONT COLOR=maroon>ident</FONT></A></I> is <TT>char</TT> (or <TT>char option</TT>).
270
180
Option types are introduced when overall rule matching does not
271
181
imply matching of the bound sub-pattern. This is in particular the
272
case of <FONT COLOR=blue><TT>(</TT> <FONT COLOR=maroon><TT><I>regexp</I></TT></FONT> <TT>as</TT> <TT><FONT COLOR=maroon><I>indent</I></FONT>)</TT> <TT>?</TT></FONT> and of
273
<FONT COLOR=maroon><I><TT>regexp</TT></I></FONT><SUB>1</SUB> <FONT COLOR=blue><TT>|</TT> <TT>(</TT></FONT> <FONT COLOR=maroon><I><TT>regexp</TT></I></FONT><SUB>2</SUB> <FONT COLOR=blue><TT>as</TT> <FONT COLOR=maroon><TT><I>ident</I></TT></FONT> <TT>)</TT></FONT>.<BR>
275
There is no linearity restriction over <TT>as</TT> bound variables.
182
case of <FONT COLOR=blue><TT>(</TT></FONT> <I><A HREF="#regexp"><FONT COLOR=maroon>regexp</FONT></A></I> <FONT COLOR=blue><TT>as</TT></FONT> �<I><A HREF="lex.html#ident"><FONT COLOR=maroon>ident</FONT></A></I> <FONT COLOR=blue><TT>)</TT> <TT>?</TT></FONT> and of
183
<I><A HREF="#regexp"><FONT COLOR=maroon>regexp</FONT></A></I><SUB>1</SUB> <FONT COLOR=blue><TT>|</TT> <TT>(</TT></FONT> �<I><A HREF="#regexp"><FONT COLOR=maroon>regexp</FONT></A></I><SUB>2</SUB> <FONT COLOR=blue><TT>as</TT></FONT> �<I><A HREF="lex.html#ident"><FONT COLOR=maroon>ident</FONT></A></I> <FONT COLOR=blue><TT>)</TT></FONT>.</P><P>There is no linearity restriction over <TT>as</TT> bound variables.
276
184
When a variable is bound more than once, the previous rules are to be
277
185
extended as follows:
278
<UL CLASS="itemize"><LI CLASS="li-itemize">
186
</P><UL CLASS="itemize"><LI CLASS="li-itemize">
279
187
A variable is a <TT>char</TT> variable when all its occurrences bind
280
188
<TT>char</TT> occurrences in the previous sense.
281
<LI CLASS="li-itemize">A variable is an <TT>option</TT> variable when the overall expression
189
</LI><LI CLASS="li-itemize">A variable is an <TT>option</TT> variable when the overall expression
282
190
can be matched without binding this variable.
285
193
<CODE>('a' as x) | ( 'a' (_ as x) )</CODE> the variable <CODE>x</CODE> is of type
286
194
<TT>char</TT>, whereas in
287
195
<CODE>("ab" as x) | ( 'a' (_ as x) ? )</CODE> the variable <CODE>x</CODE> is of type
288
<TT>string option</TT>.<BR>
290
In some cases, a sucessful match may not yield a unique set of bindings.
196
<TT>string option</TT>.</P><P>In some cases, a sucessful match may not yield a unique set of bindings.
291
197
For instance the matching of <CODE>aba</CODE> by the regular expression
292
198
<CODE>(('a'|"ab") as x) (("ba"|'a') as y)</CODE> may result in binding
341
Comments are enclosed between <CODE>/*</CODE> and <CODE>*/</CODE> (as in C) in the
342
“declarations” and “rules” sections, and between <CODE>(*</CODE> and
343
<CODE>*)</CODE> (as in Caml) in the “header” and “trailer” sections.<BR>
346
<H3 CLASS="subsection"><A NAME="htoc139">12.4.1</A> Header and trailer</H3>
347
The header and the trailer sections are Caml code that is copied
232
</PRE><P>Comments are enclosed between <CODE>/*</CODE> and <CODE>*/</CODE> (as in C) in the
233
“declarations” and “rules” sections, and between <CODE>(*</CODE> and
234
<CODE>*)</CODE> (as in Caml) in the “header” and “trailer” sections.</P><H3 CLASS="subsection"><A NAME="htoc139">12.4.1</A>��Header and trailer</H3><P>The header and the trailer sections are Caml code that is copied
348
235
as is into file <I>grammar</I><TT>.ml</TT>. Both sections are optional. The header
349
236
goes at the beginning of the output file; it usually contains
350
237
<TT>open</TT> directives and auxiliary functions required by the semantic
351
actions of the rules. The trailer goes at the end of the output file.<BR>
354
<H3 CLASS="subsection"><A NAME="htoc140">12.4.2</A> Declarations</H3>
355
Declarations are given one per line. They all start with a <CODE>%</CODE> sign.
356
<DL CLASS="description" COMPACT=compact><DT CLASS="dt-description"><B><FONT COLOR=blue><TT>%token</TT></FONT> <FONT COLOR=maroon><TT><I>symbol</I></TT></FONT> … <FONT COLOR=maroon><TT><I>symbol</I></TT></FONT></B><DD CLASS="dd-description">
357
Declare the given symbols as tokens (terminal symbols). These symbols
358
are added as constant constructors for the <TT>token</TT> concrete type.<BR>
360
<DT CLASS="dt-description"><B><FONT COLOR=blue><TT>%token</TT></FONT> <FONT COLOR=blue><TT><</TT></FONT> <FONT COLOR=maroon><I><TT>type</TT></I> <FONT COLOR=blue><TT>></TT></FONT> <I><TT>symbol</TT></I></FONT> … <FONT COLOR=maroon><TT><I>symbol</I></TT></FONT></B><DD CLASS="dd-description">
361
Declare the given symbols as tokens with an attached attribute of the
238
actions of the rules. The trailer goes at the end of the output file.</P><H3 CLASS="subsection"><A NAME="htoc140">12.4.2</A>��Declarations</H3><P>Declarations are given one per line. They all start with a <CODE>%</CODE> sign.</P><DL CLASS="description"><DT CLASS="dt-description"><B><FONT COLOR=blue><TT>%token</TT></FONT> <I><A HREF="manual011.html#constr"><FONT COLOR=maroon>constr</FONT></A></I> … �<I><A HREF="manual011.html#constr"><FONT COLOR=maroon>constr</FONT></A></I></B></DT><DD CLASS="dd-description">
239
Declare the given symbols <I><A HREF="manual011.html#constr"><FONT COLOR=maroon>constr</FONT></A></I> … �<I><A HREF="manual011.html#constr"><FONT COLOR=maroon>constr</FONT></A></I>
240
as tokens (terminal symbols). These symbols
241
are added as constant constructors for the <TT>token</TT> concrete type.</DD><DT CLASS="dt-description"><B><FONT COLOR=blue><TT>%token</TT> <TT><</TT></FONT> <I><A HREF="manual012.html#typexpr"><FONT COLOR=maroon>typexpr</FONT></A></I> <FONT COLOR=blue><TT>></TT></FONT> �<I><A HREF="manual011.html#constr"><FONT COLOR=maroon>constr</FONT></A></I> … �<I><A HREF="manual011.html#constr"><FONT COLOR=maroon>constr</FONT></A></I></B></DT><DD CLASS="dd-description">
242
Declare the given symbols <I><A HREF="manual011.html#constr"><FONT COLOR=maroon>constr</FONT></A></I> … �<I><A HREF="manual011.html#constr"><FONT COLOR=maroon>constr</FONT></A></I> as tokens with an
243
attached attribute of the
362
244
given type. These symbols are added as constructors with arguments of
363
the given type for the <TT>token</TT> concrete type. The <FONT COLOR=maroon><I><TT>type</TT></I></FONT> part is
245
the given type for the <TT>token</TT> concrete type. The <I><A HREF="manual012.html#typexpr"><FONT COLOR=maroon>typexpr</FONT></A></I> part is
364
246
an arbitrary Caml type expression, except that all type
365
247
constructor names must be fully qualified (e.g. <TT>Modname.typename</TT>)
366
248
for all types except standard built-in types, even if the proper
367
249
<CODE>open</CODE> directives (e.g. <CODE>open Modname</CODE>) were given in the
368
250
header section. That's because the header is copied only to the <TT>.ml</TT>
369
output file, but not to the <TT>.mli</TT> output file, while the <FONT COLOR=maroon><I><TT>type</TT></I></FONT> part
370
of a <CODE>%token</CODE> declaration is copied to both.<BR>
372
<DT CLASS="dt-description"><B><FONT COLOR=blue><TT>%start</TT></FONT> <FONT COLOR=maroon><TT><I>symbol</I></TT></FONT> … <FONT COLOR=maroon><TT><I>symbol</I></TT></FONT></B><DD CLASS="dd-description">
251
output file, but not to the <TT>.mli</TT> output file, while the <I><A HREF="manual012.html#typexpr"><FONT COLOR=maroon>typexpr</FONT></A></I> part
252
of a <CODE>%token</CODE> declaration is copied to both.</DD><DT CLASS="dt-description"><B><FONT COLOR=blue><TT>%start</TT></FONT> <FONT COLOR=maroon><I>symbol</I></FONT> … �<FONT COLOR=maroon><I>symbol</I></FONT></B></DT><DD CLASS="dd-description">
373
253
Declare the given symbols as entry points for the grammar. For each
374
254
entry point, a parsing function with the same name is defined in the
375
255
output module. Non-terminals that are not declared as entry points
376
256
have no such parsing function. Start symbols must be given a type with
377
the <CODE>%type</CODE> directive below.<BR>
379
<DT CLASS="dt-description"><B><FONT COLOR=blue><TT>%type</TT></FONT> <FONT COLOR=blue><TT><</TT></FONT> <FONT COLOR=maroon><I><TT>type</TT></I> <FONT COLOR=blue><TT>></TT></FONT> <I><TT>symbol</TT></I></FONT> … <FONT COLOR=maroon><TT><I>symbol</I></TT></FONT></B><DD CLASS="dd-description">
257
the <CODE>%type</CODE> directive below.</DD><DT CLASS="dt-description"><B><FONT COLOR=blue><TT>%type</TT> <TT><</TT></FONT> <I><A HREF="manual012.html#typexpr"><FONT COLOR=maroon>typexpr</FONT></A></I> <FONT COLOR=blue><TT>></TT></FONT> �<FONT COLOR=maroon><I>symbol</I></FONT> … �<FONT COLOR=maroon><I>symbol</I></FONT></B></DT><DD CLASS="dd-description">
380
258
Specify the type of the semantic attributes for the given symbols.
381
259
This is mandatory for start symbols only. Other nonterminal symbols
382
260
need not be given types by hand: these types will be inferred when
383
261
running the output files through the Objective Caml compiler (unless the
384
<CODE>-s</CODE> option is in effect). The <FONT COLOR=maroon><I><TT>type</TT></I></FONT> part is an arbitrary Caml
262
<CODE>-s</CODE> option is in effect). The <I><A HREF="manual012.html#typexpr"><FONT COLOR=maroon>typexpr</FONT></A></I> part is an arbitrary Caml
385
263
type expression, except that all type constructor names must be
386
fully qualified, as explained above for <TT>%token</TT>.<BR>
388
<DT CLASS="dt-description"><B><FONT COLOR=blue><TT>%left</TT></FONT> <FONT COLOR=maroon><TT><I>symbol</I></TT></FONT> … <FONT COLOR=maroon><TT><I>symbol</I></TT></FONT></B><DD CLASS="dd-description">
389
<DT CLASS="dt-description"><B><FONT COLOR=blue><TT>%right</TT></FONT> <FONT COLOR=maroon><TT><I>symbol</I></TT></FONT> … <FONT COLOR=maroon><TT><I>symbol</I></TT></FONT></B><DD CLASS="dd-description">
390
<DT CLASS="dt-description"><B><FONT COLOR=blue><TT>%nonassoc</TT></FONT> <FONT COLOR=maroon><TT><I>symbol</I></TT></FONT> … <FONT COLOR=maroon><TT><I>symbol</I></TT></FONT></B><DD CLASS="dd-description"><BR>
392
Associate precedences and associativities to the given symbols. All
264
fully qualified, as explained above for <TT>%token</TT>.</DD><DT CLASS="dt-description"><B><FONT COLOR=blue><TT>%left</TT></FONT> <FONT COLOR=maroon><I>symbol</I></FONT> … �<FONT COLOR=maroon><I>symbol</I></FONT></B></DT><DD CLASS="dd-description">
265
</DD><DT CLASS="dt-description"><B><FONT COLOR=blue><TT>%right</TT></FONT> <FONT COLOR=maroon><I>symbol</I></FONT> … �<FONT COLOR=maroon><I>symbol</I></FONT></B></DT><DD CLASS="dd-description">
266
</DD><DT CLASS="dt-description"><B><FONT COLOR=blue><TT>%nonassoc</TT></FONT> <FONT COLOR=maroon><I>symbol</I></FONT> … �<FONT COLOR=maroon><I>symbol</I></FONT></B></DT><DD CLASS="dd-description"><P>Associate precedences and associativities to the given symbols. All
393
267
symbols on the same line are given the same precedence. They have
394
268
higher precedence than symbols declared before in a <CODE>%left</CODE>,
395
269
<CODE>%right</CODE> or <CODE>%nonassoc</CODE> line. They have lower precedence
398
272
left (<CODE>%left</CODE>), to the right (<CODE>%right</CODE>), or to be
399
273
non-associative (<CODE>%nonassoc</CODE>). The symbols are usually tokens.
400
274
They can also be dummy nonterminals, for use with the <CODE>%prec</CODE>
401
directive inside the rules.<BR>
403
The precedence declarations are used in the following way to
275
directive inside the rules.</P><P>The precedence declarations are used in the following way to
404
276
resolve reduce/reduce and shift/reduce conflicts:
405
<UL CLASS="itemize"><LI CLASS="li-itemize">
277
</P><UL CLASS="itemize"><LI CLASS="li-itemize">
406
278
Tokens and rules have precedences. By default, the precedence
407
of a rule is the precedence of its rightmost terminal. You
408
can override this default by using the <FONT COLOR=blue><TT>%prec</TT></FONT> directive in the rule.
409
<LI CLASS="li-itemize">A reduce/reduce conflict
410
is resolved in favor of the first rule (in the order given by the
411
source file), and <TT>ocamlyacc</TT> outputs a warning.
412
<LI CLASS="li-itemize">A shift/reduce conflict
413
is resolved by comparing the precedence of the rule to be
414
reduced with the precedence of the token to be shifted. If the
415
precedence of the rule is higher, then the rule will be reduced;
416
if the precedence of the token is higher, then the token will
418
<LI CLASS="li-itemize">A shift/reduce conflict between a rule and a token with the
419
same precedence will be resolved using the associativity: if the
420
token is left-associative, then the parser will reduce; if the
421
token is right-associative, then the parser will shift. If the
422
token is non-associative, then the parser will declare a syntax
424
<LI CLASS="li-itemize">When a shift/reduce conflict cannot be resolved using the above
425
method, then <TT>ocamlyacc</TT> will output a warning and the parser will
429
<H3 CLASS="subsection"><A NAME="htoc141">12.4.3</A> Rules</H3>
430
The syntax for rules is as usual:
279
of a rule is the precedence of its rightmost terminal. You
280
can override this default by using the <FONT COLOR=blue><TT>%prec</TT></FONT> directive in the rule.
281
</LI><LI CLASS="li-itemize">A reduce/reduce conflict
282
is resolved in favor of the first rule (in the order given by the
283
source file), and <TT>ocamlyacc</TT> outputs a warning.
284
</LI><LI CLASS="li-itemize">A shift/reduce conflict
285
is resolved by comparing the precedence of the rule to be
286
reduced with the precedence of the token to be shifted. If the
287
precedence of the rule is higher, then the rule will be reduced;
288
if the precedence of the token is higher, then the token will
290
</LI><LI CLASS="li-itemize">A shift/reduce conflict between a rule and a token with the
291
same precedence will be resolved using the associativity: if the
292
token is left-associative, then the parser will reduce; if the
293
token is right-associative, then the parser will shift. If the
294
token is non-associative, then the parser will declare a syntax
296
</LI><LI CLASS="li-itemize">When a shift/reduce conflict cannot be resolved using the above
297
method, then <TT>ocamlyacc</TT> will output a warning and the parser will
299
</LI></UL></DD></DL><H3 CLASS="subsection"><A NAME="htoc141">12.4.3</A>��Rules</H3><P>The syntax for rules is as usual:
432
301
<I>nonterminal</I> :
433
<I>symbol</I> ... <I>symbol</I> { <I>semantic-action</I> }
435
| <I>symbol</I> ... <I>symbol</I> { <I>semantic-action</I> }
302
<I>symbol</I> … <I>symbol</I> { <I>semantic-action</I> }
304
| <I>symbol</I> … <I>symbol</I> { <I>semantic-action</I> }
438
307
Rules can also contain the <CODE>%prec </CODE><I>symbol</I> directive in the
439
308
right-hand side part, to override the default precedence and
440
309
associativity of the rule with the precedence and associativity of the
443
Semantic actions are arbitrary Caml expressions, that
310
given symbol.</P><P>Semantic actions are arbitrary Caml expressions, that
444
311
are evaluated to produce the semantic attribute attached to
445
312
the defined nonterminal. The semantic actions can access the
446
313
semantic attributes of the symbols in the right-hand side of
447
314
the rule with the <CODE>$</CODE> notation: <CODE>$1</CODE> is the attribute for the
448
315
first (leftmost) symbol, <CODE>$2</CODE> is the attribute for the second
451
The rules may contain the special symbol <TT>error</TT> to indicate
452
resynchronization points, as in <TT>yacc</TT>.<BR>
454
Actions occurring in the middle of rules are not supported.<BR>
456
Nonterminal symbols are like regular Caml symbols, except that they
457
cannot end with <TT>'</TT> (single quote).<BR>
460
<H3 CLASS="subsection"><A NAME="htoc142">12.4.4</A> Error handling</H3>
461
Error recovery is supported as follows: when the parser reaches an
316
symbol, etc.</P><P>The rules may contain the special symbol <TT>error</TT> to indicate
317
resynchronization points, as in <TT>yacc</TT>.</P><P>Actions occurring in the middle of rules are not supported.</P><P>Nonterminal symbols are like regular Caml symbols, except that they
318
cannot end with <TT>'</TT> (single quote).</P><H3 CLASS="subsection"><A NAME="htoc142">12.4.4</A>��Error handling</H3><P>Error recovery is supported as follows: when the parser reaches an
462
319
error state (no grammar rules can apply), it calls a function named
463
320
<TT>parse_error</TT> with the string <TT>"syntax error"</TT> as argument. The default
464
321
<TT>parse_error</TT> function does nothing and returns, thus initiating error
465
322
recovery (see below). The user can define a customized <TT>parse_error</TT>
466
function in the header section of the grammar file.<BR>
468
The parser also enters error recovery mode if one of the grammar
469
actions raises the <TT>Parsing.Parse_error</TT> exception.<BR>
471
In error recovery mode, the parser discards states from the
323
function in the header section of the grammar file.</P><P>The parser also enters error recovery mode if one of the grammar
324
actions raises the <TT>Parsing.Parse_error</TT> exception.</P><P>In error recovery mode, the parser discards states from the
472
325
stack until it reaches a place where the error token can be shifted.
473
326
It then discards tokens from the input until it finds three successive
474
327
tokens that can be accepted, and starts processing with the first of
475
328
these. If no state can be uncovered where the error token can be
476
329
shifted, then the parser aborts by raising the <TT>Parsing.Parse_error</TT>
479
Refer to documentation on <TT>yacc</TT> for more details and guidance in how
480
to use error recovery.<BR>
483
<H2 CLASS="section"><A NAME="htoc143">12.5</A> Options</H2>
484
The <TT>ocamlyacc</TT> command recognizes the following options:
485
<DL CLASS="description" COMPACT=compact><DT CLASS="dt-description"><B><TT>-b</TT><I>prefix</I></B><DD CLASS="dd-description">
330
exception.</P><P>Refer to documentation on <TT>yacc</TT> for more details and guidance in how
331
to use error recovery.</P><H2 CLASS="section"><A NAME="htoc143">12.5</A>��Options</H2><P>The <TT>ocamlyacc</TT> command recognizes the following options:</P><DL CLASS="description"><DT CLASS="dt-description"><B><TT>-b</TT><I>prefix</I></B></DT><DD CLASS="dd-description">
486
332
Name the output files <I>prefix</I><TT>.ml</TT>, <I>prefix</I><TT>.mli</TT>,
487
<I>prefix</I><TT>.output</TT>, instead of the default naming convention.<BR>
489
<DT CLASS="dt-description"><TT><B>-v</B></TT><DD CLASS="dd-description">
333
<I>prefix</I><TT>.output</TT>, instead of the default naming convention.</DD><DT CLASS="dt-description"><TT><B>-v</B></TT></DT><DD CLASS="dd-description">
490
334
Generate a description of the parsing tables and a report on conflicts
491
335
resulting from ambiguities in the grammar. The description is put in
492
file <I>grammar</I><TT>.output</TT>.<BR>
494
<DT CLASS="dt-description" style="background-color:yellow; color:red"><TT><B>-version</B></TT><DD CLASS="dd-description" style="background-color:yellow; color:red">
495
Print version and exit.</DL>
496
At run-time, the <TT>ocamlyacc</TT>-generated parser can be debugged by
336
file <I>grammar</I><TT>.output</TT>.</DD><DT CLASS="dt-description"><TT><B>-version</B></TT></DT><DD CLASS="dd-description">
337
Print version and exit.</DD></DL><P>At run-time, the <TT>ocamlyacc</TT>-generated parser can be debugged by
497
338
setting the <TT>p</TT> option in the <TT>OCAMLRUNPARAM</TT> environment variable
498
(see section <A HREF="manual024.html#ocamlrun-options">10.2</A>). This causes the pushdown
339
(see section�<A HREF="manual024.html#ocamlrun-options">10.2</A>). This causes the pushdown
499
340
automaton executing the parser to print a trace of its action (tokens
500
341
shifted, rules reduced, etc). The trace mentions rule numbers and
501
342
state numbers that can be interpreted by looking at the file
502
<I>grammar</I><TT>.output</TT> generated by <TT>ocamlyacc -v</TT>.<BR>
505
<H2 CLASS="section"><A NAME="htoc144">12.6</A> A complete example</H2>
506
The all-time favorite: a desk calculator. This program reads
343
<I>grammar</I><TT>.output</TT> generated by <TT>ocamlyacc -v</TT>.</P><H2 CLASS="section"><A NAME="htoc144">12.6</A>��A complete example</H2><P>The all-time favorite: a desk calculator. This program reads
507
344
arithmetic expressions on standard input, one per line, and prints
508
345
their values. Here is the grammar definition:
509
<PRE CLASS="verbatim">
510
/* File parser.mly */
346
</P><PRE CLASS="verbatim"> /* File parser.mly */
511
347
%token <int> INT
512
348
%token PLUS MINUS TIMES DIV
513
349
%token LPAREN RPAREN