1
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN"
2
"http://www.w3.org/TR/REC-html40/loose.dtd">
6
<META http-equiv="Content-Type" content="text/html; charset=ISO-8859-1">
7
<META name="GENERATOR" content="hevea 1.06">
12
<BODY TEXT=black BGCOLOR=white>
13
<A HREF="tutorial004.html"><IMG SRC ="previous_motif.gif" ALT="Previous"></A>
14
<A HREF="index.html"><IMG SRC ="contents_motif.gif" ALT="Up"></A>
15
<A HREF="tutorial006.html"><IMG SRC ="next_motif.gif" ALT="Next"></A>
17
<TABLE CELLPADDING=0 CELLSPACING=0 WIDTH="100%">
18
<TR><TD BGCOLOR="#2de52d"><DIV ALIGN=center><TABLE>
19
<TR><TD><A NAME="htoc39"><B><FONT SIZE=6>Chapter 5</FONT></B></A></TD>
20
<TD WIDTH="100%" ALIGN=center><B><FONT SIZE=6>The revised syntax</FONT></B></TD>
21
</TR></TABLE></DIV></TD>
23
<A NAME="c:tutrevis"></A>
24
The revised syntax is an alternative syntax for OCaml. Its purposes
25
are 1/ fix some problems of the normal syntax (unclosed constructions
26
sometimes introducing ambiguities, constructors arity, end of top
27
level phrases and structure items, etc) 2/ avoid unjustified double
28
constructions (<CODE>":="</CODE> vs ``<CODE><-</CODE>'', ``fun'' vs ``function'',
29
``begin..end'' vs parentheses) or concepts (types and types
30
declarations) 3/ bring some ideas (lists, types). In a word, propose a
31
syntax which be more logical, simpler, more consistent and easier to
32
parse and to pretty print.<BR>
36
The revised syntax, being few used, is less constrained by the
37
history than the normal one, and can try to answer the question: ``how
38
things should be done'' instead of ``how to remain compatible with old
43
Other motivations are: 1/ show that syntax is just a ``shell'' of the
44
language: you can change it without modifying the background 2/
45
experiment right to the end the ability of Camlp4 of doing syntax
50
It is a syntax of the complete language, therefore it can be used for
51
all OCaml programs: by the way, Camlp4 is itself completely written in
52
that syntax. Notice that it is not a constraint: it is always possible
53
to convert from and to the normal syntax, using the pretty print
54
facilities of Camlp4.<BR>
58
Remark: syntax in programming languages is much a question of personal
59
taste. This syntax represents mine, with some ideas taken here and
60
there. Some choices may seem arbitrary (other solutions are possible),
61
but I tried to keep some consistency, and without being too far from
62
the normal syntax: I guess that it is possible to understand a program
63
written in revised syntax even without having read this chapter.<BR>
67
Most of the constructions in revised syntax are therefore the same
68
than in the normal syntax. This chapter presents only the differences,
69
and the motivations of them.<BR>
73
The quotations for OCaml syntax trees, which we shall see in next
74
chapter, use the revised syntax.<BR>
78
<A NAME="toc33"></A><TABLE CELLPADDING=0 CELLSPACING=0 WIDTH="100%">
79
<TR><TD BGCOLOR="#66ff66"><DIV ALIGN=center><TABLE>
80
<TR><TD><A NAME="htoc40"><B><FONT SIZE=5>5.1</FONT></B></A></TD>
81
<TD WIDTH="100%" ALIGN=center><B><FONT SIZE=5>Practical points</FONT></B></TD>
82
</TR></TABLE></DIV></TD>
84
To compile the file <CODE>foo.ml</CODE> written in revised syntax, use:
86
$ ocamlc -pp camlp4r foo.ml
88
To use the revised syntax in the toplevel, do:
93
<A NAME="toc34"></A><TABLE CELLPADDING=0 CELLSPACING=0 WIDTH="100%">
94
<TR><TD BGCOLOR="#66ff66"><DIV ALIGN=center><TABLE>
95
<TR><TD><A NAME="htoc41"><B><FONT SIZE=5>5.2</FONT></B></A></TD>
96
<TD WIDTH="100%" ALIGN=center><B><FONT SIZE=5>Phrases</FONT></B></TD>
97
</TR></TABLE></DIV></TD>
99
<UL><LI>In revised syntax, simple semicolons end the items of structures,
100
signatures and objects. These semicolons are <EM>mandatory</EM>. The
101
double semicolon is no more a token. There is no ambiguity with the
102
sequence, which has a special construction (see further).<BR>
104
<LI>The declaration of a global variable is introduced by the keyword
105
``<CODE>value</CODE>'', ``<CODE>let</CODE>'' being reserved to the construction
106
``<CODE>let..in</CODE>'':<BR>
109
<center><table border=0 width="75%"><tr><th align=left width="50%">OCaml</th><th align=left width="50%">Revised</th></tr>
110
<tr><td><tt>let x = 23;;</tt></td><td><tt>value x = 23;</tt></td></tr>
111
<tr><td><tt>let x = 23 in x + 7;;</tt></td><td><tt>let x = 23 in x + 7;</tt></td></tr>
115
<LI>In interfaces, one must use ``<CODE>value</CODE>'', too, instead of
116
``<CODE>val</CODE>''.<BR>
119
<center><table border=0 width="75%"><tr><th align=left width="50%">OCaml</th><th align=left width="50%">Revised</th></tr>
120
<tr><td><tt>val x : int;;</tt></td><td><tt>value x : int;</tt></td></tr>
125
<FONT SIZE=2><B>Motivation of the simple semicolon</B><BR>
129
The double semicolon in OCaml exists for historical reasons: the first
130
parsers were driven by the tokens, not by the rules: all constructions
131
needed to have a specific token.<BR>
135
But because of the introduction of modules in OCaml, the double
136
semicolon, which was mandatory in Caml Light to end sentences, became
137
optional: the reason is that in OCaml, a ``phrase'' and a
138
``structure item'' are actually the same notion. The problem is that
139
the double semicolon is associated with the idea of ``terminating''
140
something: for a phrase, it is exact, but not for a structure item
141
inside a structure, since other structure items and the keyword
146
That choice of letting the double semicolon be optional in normal
147
syntax has introduced several problems:
148
</FONT><UL><LI><FONT SIZE=2>A structure item is actually ended by the beginning of the next
149
structure item; it means that all structure items must start with a
150
keyword; otherwise there is an ambiguity. For example, you cannot write:
151
</FONT><PRE><FONT SIZE=2>
152
print_string "hello, world"
155
<FONT SIZE=2>because it is interpreted as a call to </FONT><CODE><FONT SIZE=2>print_string</FONT></CODE><FONT SIZE=2> with 3
156
parameters (and typing error). The advocated solution is to write:
157
</FONT><PRE><FONT SIZE=2>
158
let _ = print_string "hello, world"
159
let _ = print_newline ()
161
<FONT SIZE=2>Mmm....</FONT><BR>
163
<LI><FONT SIZE=2>But this solution does not work interactively: in the toplevel, you
164
cannot ask people to type the beginning of the next sentence to see
165
the result of the current one. Therefore the double semicolon still
166
remains! The property that we write in the toplevel like in source
167
files has been lost.</FONT><BR>
169
<LI><FONT SIZE=2>In structures and objects, the fact that you don't end the
170
structure items and object items make the programs more difficult to
171
read. If you write a short object or structure item in one only line,
172
it is very difficult to see where the items start and end.</FONT></UL>
175
My opinion is that the structure items should end with a token in a
176
context where there is never need to read another token. This ensures
177
a correct behavior in the interactive toplevel. The fact that the
178
sequence is closed, in the revised syntax, frees the simple semicolon.
179
And a simple semicolon is perfectly acceptable inside structures and
180
objects, to end their item, the same way they close a record item. In
181
the revised syntax, this ending semicolon is mandatory.<BR>
185
It is easier to treat a language whose all phrases end with a token:
186
at end of the sentences, the characters and the tokens streams are
187
synchronized (no need to read an extra token to be sure that the
188
phrase is ended). This property can bring simplifications in other
189
treatments (extraction of comments or code for documentation,
190
indentation, editors modes, interactive tools).<BR>
194
<B>Motivation of ``value''</B><BR>
198
The choice of having a different keyword </FONT><CODE><FONT SIZE=2>value</FONT></CODE><FONT SIZE=2> instead of
199
</FONT><CODE><FONT SIZE=2>let</FONT></CODE><FONT SIZE=2>, for a toplevel value definition, is to mark the difference
200
with the </FONT><CODE><FONT SIZE=2>let..in</FONT></CODE><FONT SIZE=2> construct. At toplevel, to see if it is a
201
</FONT><CODE><FONT SIZE=2>let</FONT></CODE><FONT SIZE=2> or or </FONT><CODE><FONT SIZE=2>let..in</FONT></CODE><FONT SIZE=2>, we have to look at the end of the let binding.<BR>
205
In the abstract syntax tree, </FONT><CODE><FONT SIZE=2>let</FONT></CODE><FONT SIZE=2> and </FONT><CODE><FONT SIZE=2>let...in</FONT></CODE><FONT SIZE=2> are very
206
different: they do not even have the same type: </FONT><CODE><FONT SIZE=2>let</FONT></CODE><FONT SIZE=2> is a
207
structure item, while </FONT><CODE><FONT SIZE=2>let...in</FONT></CODE><FONT SIZE=2> is an expression. This deserves
208
to be more visible in the concrete syntax.<BR>
212
Why not </FONT><CODE><FONT SIZE=2>val</FONT></CODE><FONT SIZE=2> instead of </FONT><CODE><FONT SIZE=2>value</FONT></CODE><FONT SIZE=2>? It is to be coherent with
213
the other declarations </FONT><CODE><FONT SIZE=2>type</FONT></CODE><FONT SIZE=2> and </FONT><CODE><FONT SIZE=2>exception</FONT></CODE><FONT SIZE=2>, which are not
214
abbreviations: we don't write </FONT><CODE><FONT SIZE=2>typ</FONT></CODE><FONT SIZE=2> for type declarations, nor
215
</FONT><CODE><FONT SIZE=2>exc</FONT></CODE><FONT SIZE=2> for exception declarations.</FONT><BR>
217
<A NAME="toc35"></A><TABLE CELLPADDING=0 CELLSPACING=0 WIDTH="100%">
218
<TR><TD BGCOLOR="#66ff66"><DIV ALIGN=center><TABLE>
219
<TR><TD><A NAME="htoc42"><B><FONT SIZE=5>5.3</FONT></B></A></TD>
220
<TD WIDTH="100%" ALIGN=center><B><FONT SIZE=5>Imperative constructions</FONT></B></TD>
221
</TR></TABLE></DIV></TD>
224
The sequence is introduced by the keyword
225
``<CODE>do</CODE>'' followed by ``<CODE>{</CODE>'' and terminated by ``<CODE>}</CODE>''
226
(it is possible to put a semicolon after the last expression):
228
<center><table border=0 width="75%"><tr><th align=left width="50%">OCaml</th><th align=left width="50%">Revised</th></tr>
229
<tr><td><tt>e1; e2; e3; e4</tt></td><td><tt>do { e1; e2; e3; e4 }</tt></td></tr>
233
<LI>The body of ``<CODE>for</CODE>'' and ``<CODE>while</CODE>'' has the same
236
<center><table border=0 width="75%"><tr><th align=left width="50%">OCaml</th><th align=left width="50%">Revised</th></tr>
237
<tr><td><tt>while e1 do</tt></td><td><tt>while e1 do {</tt></td></tr>
238
<tr><td><tt> e2; e3; e4</tt></td><td><tt> e2; e3; e4</tt></td></tr>
239
<tr><td><tt>done</tt></td><td><tt>}</tt></td></tr>
243
<LI>The ``lets'' apply up to the end of the sequences.</UL><FONT SIZE=2>
244
<B>Motivation of ``do'' and braces</B><BR>
248
First, the sequence needed to be closed. For the reason of the
249
previous section (toplevel phrases), but also because there are too
250
many ambiguities with other constructions. For example in the list:
251
</FONT><PRE><FONT SIZE=2>
254
<FONT SIZE=2>We know that it is the list of </FONT><CODE><FONT SIZE=2>a</FONT></CODE><FONT SIZE=2>, </FONT><CODE><FONT SIZE=2>b</FONT></CODE><FONT SIZE=2> and </FONT><CODE><FONT SIZE=2>c</FONT></CODE><FONT SIZE=2>. But it
255
could be interpreted as a list one element, the sequence
256
</FONT><CODE><FONT SIZE=2>"a; b; c"</FONT></CODE><FONT SIZE=2>. In the grammar, it supposes that list items are not
257
``top'' expressions (expressions of the first level of the ``expr''
258
grammar entry): it is mandatory to use things like ``expression-1'' or
259
``simple expression'' in the grammar.<BR>
263
In revised syntax, this case never occurs: when a rule needs an
264
expression, it always uses the top level of the ``expr'' entry. The
265
grammar is then simpler and easier to read and understand.<BR>
269
The choice of </FONT><CODE><FONT SIZE=2>"do"</FONT></CODE><FONT SIZE=2> followed by braces has something
270
arbitrary. However, the keyword </FONT><CODE><FONT SIZE=2>"do"</FONT></CODE><FONT SIZE=2> let us easily think of
271
something imperative (not functional). And the braces remind the
272
sequence in the C language.<BR>
276
Why not </FONT><CODE><FONT SIZE=2>do..done</FONT></CODE><FONT SIZE=2>? Question of taste. It could have been
277
</FONT><CODE><FONT SIZE=2>do..done</FONT></CODE><FONT SIZE=2>. The idea is to remain relatively discrete. And the
278
proposed construction saves a keyword.<BR>
282
Note that a </FONT><CODE><FONT SIZE=2>let...in</FONT></CODE><FONT SIZE=2> in the sequence applies up to the end of
283
the sequence, like in normal syntax. However, in normal syntax,
284
because of the fact that the sequence is an opened construction, you
285
can obtain strange results. In the example:
286
</FONT><PRE><FONT SIZE=2>
292
<FONT SIZE=2>Let us suppose that you need to add a let binding for the ``simple
293
statement'': if you just add it, this is what you see:
294
</FONT><PRE><FONT SIZE=2>
301
<FONT SIZE=2>But what you get is actually:
302
</FONT><PRE><FONT SIZE=2>
309
<FONT SIZE=2>The </FONT><CODE><FONT SIZE=2>let</FONT></CODE><FONT SIZE=2> has ``absorbed'' the rest of the sequence, which is now
310
included in the if condition. To be correct, you need to add an
311
enclosing </FONT><CODE><FONT SIZE=2>begin..end</FONT></CODE><FONT SIZE=2> or parentheses.</FONT><BR>
313
<A NAME="toc36"></A><TABLE CELLPADDING=0 CELLSPACING=0 WIDTH="100%">
314
<TR><TD BGCOLOR="#66ff66"><DIV ALIGN=center><TABLE>
315
<TR><TD><A NAME="htoc43"><B><FONT SIZE=5>5.4</FONT></B></A></TD>
316
<TD WIDTH="100%" ALIGN=center><B><FONT SIZE=5>Tuples and lists</FONT></B></TD>
317
</TR></TABLE></DIV></TD>
320
Parentheses are mandatory in tuples:
322
<center><table border=0 width="75%"><tr><th align=left width="50%">OCaml</th><th align=left width="50%">Revised</th></tr>
323
<tr><td><tt>1, "hello", World</tt></td><td><tt>(1, "hello", World)</tt></td></tr>
327
<LI>Lists are always enclosed with ``<CODE>[</CODE>'' and ``<CODE>]</CODE>''.
330
<TABLE CELLSPACING=2 CELLPADDING=0>
331
<TR><TD ALIGN=left NOWRAP><EM>list</EM></TD>
332
<TD ALIGN=right NOWRAP>::=</TD>
333
<TD ALIGN=left NOWRAP><CODE>[</CODE> <EM>elem-list opt-cons</EM> <CODE>]</CODE></TD>
335
<TR><TD ALIGN=left NOWRAP><EM>elem-list</EM></TD>
336
<TD ALIGN=right NOWRAP>::=</TD>
337
<TD ALIGN=left NOWRAP><EM>expression</EM> <CODE>;</CODE> <EM>elem-list</EM> |
338
<EM>expression</EM></TD>
340
<TR><TD ALIGN=left NOWRAP><EM>opt-cons</EM></TD>
341
<TD ALIGN=right NOWRAP>::=</TD>
342
<TD ALIGN=left NOWRAP><CODE>::</CODE> <EM>expression</EM> | <EM>(*empty*)</EM></TD>
345
A list is a sequence of expressions separated by semicolons, optionally
346
ended by a ``<CODE>::</CODE>'' and an expression, the whole being always enclosed
350
<center><table border=0 width="75%"><tr><th align=left width="50%">OCaml</th><th align=left width="50%">Revised</th></tr>
351
<tr><td><tt>x::y</tt></td><td><tt>[x::y]</tt></td></tr>
352
<tr><td><tt>[x; y; z]</tt></td><td><tt>[x; y; z]</tt></td></tr>
353
<tr><td><tt>x::y::z::t</tt></td><td><tt>[x::[y::[z::t]]]</tt></td></tr>
354
<tr><td><tt>x::y::z::t</tt></td><td><tt>[x; y; z :: t]</tt></td></tr>
356
Note the two ways to write the last case.</UL><FONT SIZE=2>
357
<B>Motivation to close the tuples by parentheses</B><BR>
361
In mathematics, tuples are always between parentheses.<BR>
365
Moreover, it is in a general policy of the revised syntax: close more
366
constructions: it is easier to read and don't need to learn certain
367
subtle precedences levels.<BR>
371
<B>Motivation for the syntax of lists</B><BR>
375
In revised syntax, the lists are always closed. Be a ``cons''
376
</FONT><CODE><FONT SIZE=2>[a :: b]</FONT></CODE><FONT SIZE=2> or an enumeration of all items
377
</FONT><CODE><FONT SIZE=2>[a; b; c]</FONT></CODE><FONT SIZE=2>, we always know syntactically where a list starts
378
and when it ends.<BR>
382
This syntax have something similar of the lists in Lisp: the brackets
383
are like the parentheses, the semicolons are like the spaces and the
384
double colon is like the dot.<BR>
388
Moreover, the syntax:
389
</FONT><PRE><FONT SIZE=2>
392
<FONT SIZE=2>is more understandable and more logical than the equivalent in normal
394
</FONT><PRE><FONT SIZE=2>
397
<FONT SIZE=2>Indeed, reading it in normal syntax, the types are not clear:
398
</FONT><CODE><FONT SIZE=2>x</FONT></CODE><FONT SIZE=2>, </FONT><CODE><FONT SIZE=2>y</FONT></CODE><FONT SIZE=2> and </FONT><CODE><FONT SIZE=2>z</FONT></CODE><FONT SIZE=2> are not of same type than </FONT><CODE><FONT SIZE=2>t</FONT></CODE><FONT SIZE=2>, we
399
have to remember that this double colon is right associative, which is
400
generally not natural. In revised syntax, </FONT><CODE><FONT SIZE=2>x</FONT></CODE><FONT SIZE=2>, </FONT><CODE><FONT SIZE=2>y</FONT></CODE><FONT SIZE=2>, and
401
</FONT><CODE><FONT SIZE=2>z</FONT></CODE><FONT SIZE=2> are at the same level (separated by semicolons), different
402
from the one of </FONT><CODE><FONT SIZE=2>t</FONT></CODE><FONT SIZE=2> (separated from the rest by the double
407
In revised syntax, it is clear that </FONT><CODE><FONT SIZE=2>x</FONT></CODE><FONT SIZE=2>, </FONT><CODE><FONT SIZE=2>y</FONT></CODE><FONT SIZE=2> and
408
</FONT><CODE><FONT SIZE=2>z</FONT></CODE><FONT SIZE=2> are the first items of the list, because the syntax is
409
identical when the list is ended by a ``cons'' and when it is not,
410
what is not the case in normal syntax.</FONT><BR>
412
<A NAME="toc37"></A><TABLE CELLPADDING=0 CELLSPACING=0 WIDTH="100%">
413
<TR><TD BGCOLOR="#66ff66"><DIV ALIGN=center><TABLE>
414
<TR><TD><A NAME="htoc44"><B><FONT SIZE=5>5.5</FONT></B></A></TD>
415
<TD WIDTH="100%" ALIGN=center><B><FONT SIZE=5>Irrefutable patterns</FONT></B></TD>
416
</TR></TABLE></DIV></TD>
418
There is a notion of ``irrefutable patterns'' used by some syntactic
419
constructions (next sections). Matching against these patterns never
420
fails. An ``irrefutable pattern'' is either:
423
<LI>The wildcard ``<CODE>_</CODE>''.
424
<LI>The constructor ``<CODE>()</CODE>''.
425
<LI>A tuple with irrefutable patterns.
426
<LI>A record with irrefutable patterns.
427
<LI>An irrefutable pattern with a type constraint.
429
Note that the term ``irrefutable'' does not apply to all patterns
430
which never fail: constructors alone in their type definition,
431
except ``<CODE>()</CODE>'', are not said ``irrefutable'' (the fact that
432
they be alone or not cannot be determined at parsing time).<BR>
434
<A NAME="toc38"></A><TABLE CELLPADDING=0 CELLSPACING=0 WIDTH="100%">
435
<TR><TD BGCOLOR="#66ff66"><DIV ALIGN=center><TABLE>
436
<TR><TD><A NAME="htoc45"><B><FONT SIZE=5>5.6</FONT></B></A></TD>
437
<TD WIDTH="100%" ALIGN=center><B><FONT SIZE=5>Constructions with matching</FONT></B></TD>
438
</TR></TABLE></DIV></TD>
441
The keyword ``<CODE>function</CODE>'' no longer exists. One must use
442
only ``<CODE>fun</CODE>''.<BR>
444
<LI>The pattern matchings, in constructions with ``<CODE>fun</CODE>'',
445
``<CODE>match</CODE>'' and ``<CODE>try</CODE>'' are closed by brackets: an open
446
bracket ``<CODE>[</CODE>'' before the first case, and a close bracket
447
``<CODE>]</CODE>'' after the last case:
449
<center><table border=0 width="75%"><tr><th align=left width="50%">OCaml</th><th align=left width="50%">Revised</th></tr>
450
<tr><td><tt>match e with</tt></td><td><tt>match e with</tt></td></tr>
451
<tr><td><tt> p1 -> e1</tt></td><td><tt>[ p1 -> e1</tt></td></tr>
452
<tr><td><tt>| p2 -> e2;;</tt></td><td><tt>| p2 -> e2 ];</tt></td></tr>
454
<tr><td><tt>fun x -> x;;</tt></td><td><tt>fun [x -> x];</tt></td></tr>
456
But if there is only one case and if the pattern is <EM>irrefutable</EM>, the brackets are not mandatory. These examples work
457
identically in normal and revised syntaxes:
459
<center><table border=0 width="75%"><tr><th align=left width="50%">OCaml</th><th align=left width="50%">Revised</th></tr>
460
<tr><td><tt>fun x -> x</tt></td><td><tt>fun x -> x</tt></td></tr>
461
<tr><td><tt>fun {foo=(y, _)} -> y</tt></td><td><tt>fun {foo=(y, _)} -> y</tt></td></tr>
463
Notice that in revised syntax, both <CODE>fun [ x -> x ]</CODE> and
464
<CODE>fun x -> x</CODE> are correct.
465
The currified pattern matching can be done with ``<CODE>fun</CODE>'' without
466
brackets, but only with <EM>irrefutable</EM> patterns:
468
<center><table border=0 width="75%"><tr><th align=left width="50%">OCaml</th><th align=left width="50%">Revised</th></tr>
469
<tr><td><tt>fun x (y, z) -> t</tt></td><td><tt>fun x (y, z) -> t</tt></td></tr>
470
<tr><td><tt>fun x y (C z) -> t</tt></td><td><tt>fun x y -> fun [C z -> t]</tt></td></tr>
474
<LI>It is possible to write the empty function,
475
raising the exception ``<CODE>Match_failure</CODE>'' whichever parameter is
476
applied, the empty ``match'', raising ``<CODE>Match_failure</CODE>'' after
477
having evaluated its expression, and the empty ``try'', equivalent to
478
its expression without <CODE>try</CODE>:
485
<LI>The patterns after ``<CODE>let</CODE>'' and ``<CODE>value</CODE>'' must be
486
irrefutable. The following OCaml expression:
489
</PRE>must be written in revised syntax:
491
let f = fun [ [x::y] -> ...
494
<LI>It is possible to use a construction ``<CODE>where</CODE>'', it is a
495
reversed ``<CODE>let</CODE>'', but one can write only one bind:
498
</PRE></UL><FONT SIZE=2>
499
<B>Motivation for one alone keyword ``fun''</B><BR>
503
The presence of </FONT><CODE><FONT SIZE=2>fun</FONT></CODE><FONT SIZE=2> and </FONT><CODE><FONT SIZE=2>function</FONT></CODE><FONT SIZE=2> is somewhat strange,
504
since they have the same semantics.<BR>
508
In revised syntax, by adding this notion of ``irrefutable patterns'',
509
there is no ambiguity: a list not being an irrefutable pattern, the
510
construction with brackets is not a parsing problem. When using
511
an irrefutable pattern, there must be only one case, and therefore no
512
close construction is necessary, allowing us to keep the simple
513
frequent form: </FONT><CODE><FONT SIZE=2>fun x -> x</FONT></CODE><FONT SIZE=2>.<BR>
517
<B>Motivation to close the constructions</B><BR>
521
It is to avoid the problem of the ``dangling bar'' (the same than the
522
``dangling else'' in the ``if'' construct). In normal syntax, this
524
</FONT><PRE><FONT SIZE=2>
532
<FONT SIZE=2>is wrongly interpreted: to obtain what you want, you need to use
533
parentheses or </FONT><CODE><FONT SIZE=2>begin..end</FONT></CODE><FONT SIZE=2> to close the internal </FONT><CODE><FONT SIZE=2>match</FONT></CODE><FONT SIZE=2>
534
construct. There is a same problem with the </FONT><CODE><FONT SIZE=2>if</FONT></CODE><FONT SIZE=2> construct,
535
because of the optional </FONT><CODE><FONT SIZE=2>else</FONT></CODE><FONT SIZE=2> (see further).<BR>
539
I admit that the fact that all cases do not start with the same token
540
(the first starting with a left brace, the other ones with a vertical
541
bar) is not practical in editing programs: it is indeed complicated to
542
exchange the first case and the other ones. However readability and
543
absence of ambiguity are more important than easiness to use and
544
absence of verbosity: when it is easy to edit but risk to introduce
545
bugs or irregularities, it is not sure that it be better.<BR>
549
Why not close the construction by a keyword, </FONT><CODE><FONT SIZE=2>end</FONT></CODE><FONT SIZE=2> for example,
550
like the </FONT><CODE><FONT SIZE=2>Ada</FONT></CODE><FONT SIZE=2> language does? It is because an ending keyword
551
gives an idea of something imperative, it does not make think that
552
something is returned, which is however the case in the </FONT><CODE><FONT SIZE=2>match</FONT></CODE><FONT SIZE=2>
553
construct, like most of </FONT><CODE><FONT SIZE=2>OCaml</FONT></CODE><FONT SIZE=2> constructs.<BR>
557
<B>Motivation for the empty forms</B><BR>
561
The empty function is useful for initial cases of iterations or initial
562
references values. It is not absolutely essential since it is possible
564
</FONT><PRE><FONT SIZE=2>
565
fun _ -> assert False
566
</FONT></PRE><FONT SIZE=2>The empty </FONT><CODE><FONT SIZE=2>match</FONT></CODE><FONT SIZE=2> existed before the introduction of the
567
</FONT><CODE><FONT SIZE=2>assert</FONT></CODE><FONT SIZE=2> construction in </FONT><CODE><FONT SIZE=2>OCaml</FONT></CODE><FONT SIZE=2>. Like the assert, it
568
indicates the position of the error in the file.<BR>
572
These constructions are there because they are the limit when the
573
number of the matching cases reach zero.<BR>
577
<B>Motivation for irrefutable patterns in ``let''</B><BR>
581
In normal syntax, if you use a ``let'' binding with a non irrefutable
582
pattern, you get a typing message ``pattern matching is not
583
exhaustive''. If you want to be clean and add the missing cases, you
584
have to torture your sources. Indeed, for example, the
585
</FONT><PRE><FONT SIZE=2>
587
</FONT></PRE><FONT SIZE=2>must be changed into:
588
</FONT><PRE><FONT SIZE=2>
589
match a with x :: y -> b | ...
591
<FONT SIZE=2>In revised syntax, since it is forbidden, you are never in this situation.<BR>
595
<B>Motivation for the ``where'' construct</B><BR>
599
This construction existed in the old ``Caml'' V3.1 (whose development
600
was stopped by the beginning of the 90ies) and I liked it much. There
601
was a problem in this construct, because it was possible to add
602
several bindings separated with ``and'', which sometimes could enter
603
in conflict (another ``dangling'' case) with a possible ``and'' in an
605
</FONT><PRE><FONT SIZE=2>
609
</FONT></PRE><FONT SIZE=2>In this situation, the ``where'' construct used to ``absorb'' the
610
``and'' of the ``let'' binding. The program was interpreted as:
611
</FONT><PRE><FONT SIZE=2>
613
b where c = d and e = f
616
<FONT SIZE=2>Because of that, in </FONT><CODE><FONT SIZE=2>Caml Light</FONT></CODE><FONT SIZE=2> and </FONT><CODE><FONT SIZE=2>OCaml</FONT></CODE><FONT SIZE=2>, the ``where''
617
construction were removed. But a ``where'' with only one binding could
618
works. Anyway, having several bindings is not interesting nor useful nor
619
readable, in this construction.<BR>
623
I personally use this construction in the case when the ``let''
624
binding is a function definition and the expression a call to this
625
function. I generally prefer to write:
626
</FONT><PRE><FONT SIZE=2>
627
loop 0 where rec loop i = ...
629
<FONT SIZE=2>than the equivalent form:
630
</FONT><PRE><FONT SIZE=2>
631
let loop i = ... in loop 0
633
<FONT SIZE=2>I consider the form with </FONT><CODE><FONT SIZE=2>where</FONT></CODE><FONT SIZE=2> more readable in this situation.</FONT><BR>
635
<A NAME="toc39"></A><TABLE CELLPADDING=0 CELLSPACING=0 WIDTH="100%">
636
<TR><TD BGCOLOR="#66ff66"><DIV ALIGN=center><TABLE>
637
<TR><TD><A NAME="htoc46"><B><FONT SIZE=5>5.7</FONT></B></A></TD>
638
<TD WIDTH="100%" ALIGN=center><B><FONT SIZE=5>Mutables and assignment</FONT></B></TD>
639
</TR></TABLE></DIV></TD>
642
The statement ``<CODE><-</CODE>'' is written ``<CODE>:=</CODE>'':
644
<center><table border=0 width="75%"><tr><th align=left width="50%">OCaml</th><th align=left width="50%">Revised</th></tr>
645
<tr><td><tt>x.f <- y</tt></td><td><tt>x.f := y</tt></td></tr>
649
<LI>The ``<CODE>ref</CODE>'' type is used as if its field label was
650
named ``<CODE>val</CODE>'', instead of ``<CODE>contents</CODE>''. The operator
651
``<CODE>!</CODE>'' does not exist any more, and references are assigned like
654
<center><table border=0 width="75%"><tr><th align=left width="50%">OCaml</th><th align=left width="50%">Revised</th></tr>
655
<tr><td><tt>x := !x + y</tt></td><td><tt>x.val := x.val + y</tt></td></tr>
660
<FONT SIZE=2><B>Motivation</B><BR>
664
Having two constructions for the assignment is abnormal. In normal
665
syntax, the </FONT><CODE><FONT SIZE=2>":="</FONT></CODE><FONT SIZE=2>, specific to the </FONT><CODE><FONT SIZE=2>ref</FONT></CODE><FONT SIZE=2> type, is an old
666
rest of the time when references where implemented with a constructor
667
(there were mutable constructors, then), and the codes to extract a
668
reference value and to change it were complicated:
669
</FONT><PRE><FONT SIZE=2>
670
match x with Ref x -> x
671
match x with Ref x -> x <- y
673
<FONT SIZE=2>It was then justified to have specific constructions </FONT><CODE><FONT SIZE=2>"!x"</FONT></CODE><FONT SIZE=2> and
674
</FONT><CODE><FONT SIZE=2>"x := y"</FONT></CODE><FONT SIZE=2> for these cases. Now, references are implemented with
675
a record type, and these constructions can be written:
676
</FONT><PRE><FONT SIZE=2>
680
<FONT SIZE=2>In normal syntax, there are 2 ways to access and assign references,
681
although the method using the label ``contents'' is rarely used. In
682
revised syntax, it is the only method. However, I consider
683
``contents'' as a too long identifier, it is why I changed it into
684
``val''. It is actually not a change in the definition of </FONT><CODE><FONT SIZE=2>ref</FONT></CODE><FONT SIZE=2>
685
(since </FONT><CODE><FONT SIZE=2>Camlp4</FONT></CODE><FONT SIZE=2> does only syntax), it is changed in the syntax
686
trees, the real name of the field remaining ``contents''.<BR>
690
As </FONT><CODE><FONT SIZE=2>":="</FONT></CODE><FONT SIZE=2> is no more necessary with the semantics of assigning a
691
reference value, it can be used in the place of </FONT><CODE><FONT SIZE=2>"<-"</FONT></CODE><FONT SIZE=2>, a token
692
less natural and introducing confusions (when we read it) with the
693
</FONT><CODE><FONT SIZE=2>"->"</FONT></CODE><FONT SIZE=2> of the functions and pattern matchings.<BR>
697
The construction </FONT><CODE><FONT SIZE=2>!x</FONT></CODE><FONT SIZE=2> is no more necessary either since we can
698
write </FONT><CODE><FONT SIZE=2>x.val</FONT></CODE><FONT SIZE=2>. We then save two tokens which were used only for
699
the reference type.</FONT><BR>
701
<A NAME="toc40"></A><TABLE CELLPADDING=0 CELLSPACING=0 WIDTH="100%">
702
<TR><TD BGCOLOR="#66ff66"><DIV ALIGN=center><TABLE>
703
<TR><TD><A NAME="htoc47"><B><FONT SIZE=5>5.8</FONT></B></A></TD>
704
<TD WIDTH="100%" ALIGN=center><B><FONT SIZE=5>Types</FONT></B></TD>
705
</TR></TABLE></DIV></TD>
709
The type constructors are before their type parameters, which
710
are written with the currified form:
712
<center><table border=0 width="75%"><tr><th align=left width="50%">OCaml</th><th align=left width="50%">Revised</th></tr>
713
<tr><td><tt>int list</tt></td><td><tt>list int</tt></td></tr>
714
<tr><td><tt>('a, bool) Hashtbl.t</tt></td><td><tt>Hashtbl.t 'a bool</tt></td></tr>
715
<tr><td><tt>type 'a foo =</tt></td><td><tt>type foo 'a =</tt></td></tr>
716
<tr><td><tt> 'a list list;;</tt></td><td><tt> list (list 'a);</tt></td></tr>
720
<LI>The abstract types are represented by a unbound type variable:
722
<center><table border=0 width="75%"><tr><th align=left width="50%">OCaml</th><th align=left width="50%">Revised</th></tr>
723
<tr><td><tt>type 'a foo;;</tt></td><td><tt>type foo 'a = 'b;</tt></td></tr>
724
<tr><td><tt>type bar;;</tt></td><td><tt>type bar = 'a;</tt></td></tr>
728
<LI>Parentheses are mandatory in tuples of types:
730
<center><table border=0 width="75%"><tr><th align=left width="50%">OCaml</th><th align=left width="50%">Revised</th></tr>
731
<tr><td><tt>int * bool</tt></td><td><tt>(int * bool)</tt></td></tr>
735
<LI>In declaration of a concrete type, brackets must enclose
736
the constructors declarations:
738
<center><table border=0 width="75%"><tr><th align=left width="50%">OCaml</th><th align=left width="50%">Revised</th></tr>
739
<tr><td><tt>type t = A of i | B;;</tt></td><td><tt>type t = [ A of i | B ];</tt></td></tr>
743
<LI>It is possible to make the empty type, without constructor:
748
<LI>There is a syntax difference between data constructors with
749
several parameters and data constructors with one parameter of type
752
The declaration of a data constructor with several parameters is
753
done by separating the types with ``<CODE>and</CODE>''. In expressions and
754
patterns, this constructor parameters must be currified:
756
<center><table border=0 width="75%"><tr><th align=left width="50%">OCaml</th><th align=left width="50%">Revised</th></tr>
757
<tr><td><tt>type t = C of t1 * t2;;</tt></td><td><tt>type t = [ C of t1 and t2 ];</tt></td></tr>
758
<tr><td><tt>C (x, y);;</tt></td><td><tt>C x y;</tt></td></tr>
762
The declaration of a data constructor with one parameter of type
763
tuple is done by using a tuple type. In expressions and patterns,
764
the parameter has not to be currified, since it is alone:
766
<center><table border=0 width="75%"><tr><th align=left width="50%">OCaml</th><th align=left width="50%">Revised</th></tr>
767
<tr><td><tt>type t = D of (t1 * t2);;</tt></td><td><tt>type t = [ D of (t1 * t2) ];</tt></td></tr>
768
<tr><td><tt>D (x, y);;</tt></td><td><tt>D (x, y);</tt></td></tr>
772
<LI>The predefined constructors ``<CODE>True</CODE>'' and ``<CODE>False</CODE>''
773
start with an uppercase letter.<BR>
775
<LI>In record types, the keyword ``<CODE>mutable</CODE>'' must appear
778
<center><table border=0 width="75%"><tr><th align=left width="50%">OCaml</th><th align=left width="50%">Revised</th></tr>
779
<tr><td><tt>type t = {mutable x : t1};;</tt></td><td><tt>type t = {x : mutable t1};</tt></td></tr>
784
<FONT SIZE=2><B>Motivation for the applying order of type constructors</B><BR>
788
The order is to look like the constructors values: you can then read
789
value in the same order than their types. The syntax with
790
currification style is used also for value constructors.<BR>
794
<B>Motivation for the abstract types syntax</B><BR>
798
It was to look like existential types, because abstract types are
799
actually some kind of existential types. This may have a meaning if
800
existential types are included one day in </FONT><CODE><FONT SIZE=2>OCaml</FONT></CODE><FONT SIZE=2>.<BR>
804
<B>Motivation for the parentheses around tuple types</B><BR>
808
Close more constructions. Closed like tuples are. Moreover it is more
809
visible in constructor declarations to differentiate the case of two
810
parameters and one parameter being a tuple.<BR>
814
<B>Motivation for the constructor declaration type</B><BR>
818
The revised syntax have tried to be the most general possible, to plan
819
the possible future extensions of the language.<BR>
823
Record types are closed by braces (no change). Symmetrically, the sum
824
types (declaring constructors) are closed by brackets. This is also a
825
way to consider them just as ``types''. We could imagine that they be
826
authorized one day outside type declarations. For example like this:
827
</FONT><PRE><FONT SIZE=2>
828
fun (x : [ A | B ]) -> ...
829
type t = { lab : [ A | B ] }
830
type u = [ C of { lab : ...} ]
832
<FONT SIZE=2>The form of the last line is, by the way, the method used in the
833
language </FONT><CODE><FONT SIZE=2>SML</FONT></CODE><FONT SIZE=2>, where record types are always anonymous.<BR>
837
In </FONT><CODE><FONT SIZE=2>Camlp4</FONT></CODE><FONT SIZE=2> abstract syntax, there is no notion of ``type
838
declaration'': a type declaration is just a type. The fact that sum
839
types and record types are accepted only in type declarations is done
840
when converting into the abstract syntax which </FONT><CODE><FONT SIZE=2>ocamlc</FONT></CODE><FONT SIZE=2> uses.<BR>
844
<B>Motivation for the empty type</B><BR>
848
As the type constructor definition is closed, it is possible to
849
imagine the empty type. Not very useful, but we have it without any
850
cost: a type inhabited by nothing (empty set).<BR>
854
<B>Motivation for the currified syntax for constructors</B><BR>
858
This reflects the actual semantics. There are indeed two cases, and
859
the values in the two cases are implemented differently. The arity of
860
constructors are more clear.<BR>
864
In normal syntax, it is difficult to understand (and to explain) why
865
if C is a constructor with two parameters, this is accepted:
866
</FONT><PRE><FONT SIZE=2>
867
fun C (x, y) -> (x, y)
868
</FONT></PRE><FONT SIZE=2>but not that:
869
</FONT><PRE><FONT SIZE=2>
872
<FONT SIZE=2>In revised syntax you have to write:
873
</FONT><PRE><FONT SIZE=2>
874
fun [ C x y -> (x, y) ]
876
<FONT SIZE=2>The revised syntax reflects the fact that the two parameters of the
877
constructor </FONT><CODE><FONT SIZE=2>C</FONT></CODE><FONT SIZE=2> cannot be considered as a tuple.<BR>
881
This does not mean that the ``partial evaluation'' of constructors is
882
accepted: accept it or not is a semantic issue, treated at
883
</FONT><CODE><FONT SIZE=2>OCaml</FONT></CODE><FONT SIZE=2> typing time.<BR>
887
<B>Motivation for the uppercase for True and False</B><BR>
891
In normal syntax, </FONT><CODE><FONT SIZE=2>true</FONT></CODE><FONT SIZE=2> and </FONT><CODE><FONT SIZE=2>false</FONT></CODE><FONT SIZE=2> are the only
892
constructors which start with a lowercase letter. It is due to
893
historical reasons: in </FONT><CODE><FONT SIZE=2>Caml Light</FONT></CODE><FONT SIZE=2>, no constructors (of any
894
type) need to be capitalized. When </FONT><CODE><FONT SIZE=2>OCaml</FONT></CODE><FONT SIZE=2> was created, this was
895
changed, but strangely, </FONT><CODE><FONT SIZE=2>true</FONT></CODE><FONT SIZE=2> and </FONT><CODE><FONT SIZE=2>false</FONT></CODE><FONT SIZE=2> escaped to this
896
rule. They are even now considered as keywords, what they should not
897
be, since they are not syntactic constructs or part of syntactic
902
In revised syntax, they must be written </FONT><CODE><FONT SIZE=2>True</FONT></CODE><FONT SIZE=2> and </FONT><CODE><FONT SIZE=2>False</FONT></CODE><FONT SIZE=2>
903
and are not keywords.<BR>
907
<B>Motivation for mutable syntax in records</B><BR>
911
It is just to read: ``the label x is a mutable integer'' instead of
912
``the mutable label x is an integer'', which is less clear.</FONT><BR>
914
<A NAME="toc41"></A><TABLE CELLPADDING=0 CELLSPACING=0 WIDTH="100%">
915
<TR><TD BGCOLOR="#66ff66"><DIV ALIGN=center><TABLE>
916
<TR><TD><A NAME="htoc48"><B><FONT SIZE=5>5.9</FONT></B></A></TD>
917
<TD WIDTH="100%" ALIGN=center><B><FONT SIZE=5>Modules</FONT></B></TD>
918
</TR></TABLE></DIV></TD>
920
Modules application uses the currified form:
922
<center><table border=0 width="75%"><tr><th align=left width="50%">OCaml</th><th align=left width="50%">Revised</th></tr>
923
<tr><td><tt>type t = Set.Make(M).t;;</tt></td><td><tt>type t = (Set.Make M).t;</tt></td></tr>
927
<B>Motivation</B><BR>
931
Currification syntax is more natural in functional languages. There is
932
no reason to have two different syntaxes for applications (whatever we
933
apply): one with parentheses, one with currification.</FONT><BR>
935
<A NAME="toc42"></A><TABLE CELLPADDING=0 CELLSPACING=0 WIDTH="100%">
936
<TR><TD BGCOLOR="#66ff66"><DIV ALIGN=center><TABLE>
937
<TR><TD><A NAME="htoc49"><B><FONT SIZE=5>5.10</FONT></B></A></TD>
938
<TD WIDTH="100%" ALIGN=center><B><FONT SIZE=5>Classes and objects</FONT></B></TD>
939
</TR></TABLE></DIV></TD>
941
The classes and objects also have a revised syntax. To see it, the
942
simplest way is to write examples in normal syntax and to convert them
943
into revised syntax using the command:
945
camlp4o pr_r.cmo file.ml
947
<FONT SIZE=2>(documentation to be updated)</FONT><BR>
949
<A NAME="toc43"></A><TABLE CELLPADDING=0 CELLSPACING=0 WIDTH="100%">
950
<TR><TD BGCOLOR="#66ff66"><DIV ALIGN=center><TABLE>
951
<TR><TD><A NAME="htoc50"><B><FONT SIZE=5>5.11</FONT></B></A></TD>
952
<TD WIDTH="100%" ALIGN=center><B><FONT SIZE=5>Miscellaneous</FONT></B></TD>
953
</TR></TABLE></DIV></TD>
956
The ``<CODE>else</CODE>'' is mandatory in the ``<CODE>if</CODE>'' statement:
958
<center><table border=0 width="75%"><tr><th align=left width="50%">OCaml</th><th align=left width="50%">Revised</th></tr>
959
<tr><td><tt>if a then b</tt></td><td><tt>if a then b else ()</tt></td></tr>
963
<LI>The boolean operations ``or'' and ``and'' must be written only
964
with ``<CODE>||</CODE>'' and ``<CODE>&&</CODE>'':
966
<center><table border=0 width="75%"><tr><th align=left width="50%">OCaml</th><th align=left width="50%">Revised</th></tr>
967
<tr><td><tt>a or b & c</tt></td><td><tt>a || b && c</tt></td></tr>
968
<tr><td><tt>a || b && c</tt></td><td><tt>a || b && c</tt></td></tr>
972
<LI>No more ``<CODE>begin end</CODE>'' construction. One must use
973
parentheses when needed.<BR>
975
<LI>The operators as functions are written with an backslash:
977
<center><table border=0 width="75%"><tr><th align=left width="50%">OCaml</th><th align=left width="50%">Revised</th></tr>
978
<tr><td><tt>(+)</tt></td><td><tt>\+</tt></td></tr>
979
<tr><td><tt>(mod)</tt></td><td><tt>\mod</tt></td></tr>
983
<LI>The operators with special characters are not automatically
984
infix. To define infixes, use the syntax extensions.<BR>
986
<LI>It is possible to group together several declarations either in
987
an interface or in an implementation by enclosing them between
988
``<CODE>declare</CODE>'' and ``<CODE>end</CODE>''. Example in an interface:
991
type foo = [ Foo of int | Bar ];
992
value f : foo -> int;
994
</PRE></UL><FONT SIZE=2>
995
<B>Motivation for the ``else''</B><BR>
999
The </FONT><CODE><FONT SIZE=2>else</FONT></CODE><FONT SIZE=2> is mandatory to avoid the ``dangling else''
1000
problem. In normal syntax, you can write:
1001
</FONT><PRE><FONT SIZE=2>
1006
<FONT SIZE=2>In the above program, the ``else d'' will actually corresponds to the
1007
``if b'' not to the ``if a''. In revised syntax, the ``else'' being
1008
mandatory, the problem does not exist.<BR>
1012
</FONT><CODE><FONT SIZE=2>OCaml</FONT></CODE><FONT SIZE=2> being a functional language, it is normal that the ``else'' case
1013
be mandatory: indeed if the condition is false, what is returned by
1014
the statement is not clear in normal syntax.<BR>
1018
All these ``dangling'' problems cause also problems in pretty
1019
printing: it is not easy to know if the constructions have to be
1020
parenthesized or not. In revised syntax, there are no dangling
1021
problems and no problem in pretty printing. To pretty print in normal
1022
syntax, a solution had to be used, using an extra parameter
1023
transmitted in all functions.<BR>
1027
We remark that in revised syntax, the </FONT><CODE><FONT SIZE=2>if</FONT></CODE><FONT SIZE=2> construct is not
1028
closed, it does not need to be.<BR>
1032
<B>Motivation for the ``or'' and ``and'' operators</B><BR>
1036
There is no reason to accept two syntaxes for the ``or'' operator and
1037
two for the ``and'' operator. The syntaxes </FONT><CODE><FONT SIZE=2>or</FONT></CODE><FONT SIZE=2> and </FONT><CODE><FONT SIZE=2>&</FONT></CODE><FONT SIZE=2> are
1038
actually old constructions, kept for an old backward compatibility.<BR>
1042
<B>Motivation for the suppression of begin..end</B><BR>
1046
In normal syntax, the construction with </FONT><CODE><FONT SIZE=2>begin</FONT></CODE><FONT SIZE=2> and </FONT><CODE><FONT SIZE=2>end</FONT></CODE><FONT SIZE=2> is
1047
actually the same than the parentheses: often a question of personal
1048
taste. In normal syntax, when parenthesis is necessary, some
1049
programmers prefer </FONT><CODE><FONT SIZE=2>"begin match...end"</FONT></CODE><FONT SIZE=2>, other </FONT><CODE><FONT SIZE=2>"(match...)"</FONT></CODE><FONT SIZE=2>.<BR>
1053
In revised syntax, the cases when such a parenthesization is necessary
1054
is much less frequent, since most constructions are already
1055
parenthesized. Two constructions for that are not necessary.<BR>
1059
<B>Motivation for syntax for alone operators</B><BR>
1063
To avoid the case of the </FONT><CODE><FONT SIZE=2>*</FONT></CODE><FONT SIZE=2> operator which must be specifically
1064
written with spaces around it, since </FONT><CODE><FONT SIZE=2>(*)</FONT></CODE><FONT SIZE=2> in lexically
1065
interpreted as a beginning of a comment.<BR>
1069
<B>Motivation for the fact that there are no automatic infixes</B><BR>
1073
Since we are under Camlp4, we can use Camlp4 features.<BR>
1077
<B>Motivation for the ``declare'' construction</B><BR>
1081
Essential when a syntax extension in </FONT><CODE><FONT SIZE=2>OCaml</FONT></CODE><FONT SIZE=2> structure item
1082
generates several structure items. For example, if you make a syntax
1083
change in order that a type declaration generates 1/ the type
1084
declaration itself and 2/ functions to be applied to this type.<BR>
1088
When converted into </FONT><CODE><FONT SIZE=2>OCaml</FONT></CODE><FONT SIZE=2> normal syntax tree, this construct is
1091
<A NAME="toc44"></A><TABLE CELLPADDING=0 CELLSPACING=0 WIDTH="100%">
1092
<TR><TD BGCOLOR="#66ff66"><DIV ALIGN=center><TABLE>
1093
<TR><TD><A NAME="htoc51"><B><FONT SIZE=5>5.12</FONT></B></A></TD>
1094
<TD WIDTH="100%" ALIGN=center><B><FONT SIZE=5>Streams and parsers</FONT></B></TD>
1095
</TR></TABLE></DIV></TD>
1098
The streams and the stream patterns are bracketed with
1099
``<CODE>[:</CODE>'' and ``<CODE>:]</CODE>'' instead of ``<CODE>[<</CODE>'' and
1100
``<CODE>>]</CODE>''.<BR>
1102
<LI>The stream component ``terminal'' is written with a backquote
1105
<center><table border=0 width="75%"><tr><th align=left width="50%">OCaml</th><th align=left width="50%">Revised</th></tr>
1106
<tr><td><tt>[< '1; '2; s; '3 >]</tt></td><td><tt>[: `1; `2; s; `3 :]</tt></td></tr>
1110
<LI>The cases of parsers are bracketed with ``<CODE>[</CODE>'' and
1111
``<CODE>]</CODE>'', like for ``<CODE>fun</CODE>'', ``<CODE>match</CODE>'' and
1112
``<CODE>try</CODE>''. If there is one case, the brackets are not mandatory:
1114
<center><table border=0 width="75%"><tr><th align=left width="50%">OCaml</th><th align=left width="50%">Revised</th></tr>
1115
<tr><td><tt>parser</tt></td><td><tt>parser</tt></td></tr>
1116
<tr><td><tt> [< 'Foo >] -> e</tt></td><td><tt>[ [: `Foo :] -> e</tt></td></tr>
1117
<tr><td><tt>| [< p = f >] -> f</tt></td><td><tt>| [: p = f :] -> f ]</tt></td></tr>
1119
<tr><td><tt>parser [< 'x >] -> x</tt></td><td><tt>parser [ [: `x :] -> x ]</tt></td></tr>
1121
<tr><td><tt>parser [< 'x >] -> x</tt></td><td><tt>parser [: `x :] -> x</tt></td></tr>
1125
<LI>It is possible to write the empty parser
1126
raising the exception ``<CODE>Stream.</CODE><CODE>Failure</CODE>''
1127
whichever parameter is applied, and the empty stream matching always
1128
raising ``<CODE>Stream.</CODE><CODE>Failure</CODE>'':
1131
match e with parser []
1132
</PRE></UL><FONT SIZE=2>
1133
<B>Motivation for the keyword </B></FONT><CODE><FONT SIZE=2><B>"parser"</B></FONT></CODE><FONT SIZE=2><B>, rather than
1134
</B></FONT><CODE><FONT SIZE=2><B>"parse"</B></FONT></CODE><BR>
1138
Actually, it is not different from the choice of the normal syntax,
1139
since the same keyword is used.<BR>
1143
The keyword ``parser'' is like ``function'', not like ``match''. The
1144
``match'' and ``try'' statements are direct actions, with their
1145
immediate parameters. On the other hand, the parsers and functions are
1146
just ``concepts'': they are not immediately applied with their
1147
parameters. One must read: ``this is a parser'' just like ``this is
1152
The word ``parse'' might have been used if the construction was
1153
``parse xxx with''. This is written ``match xxx with parser'' in order
1154
to save a keyword.<BR>
1158
<B>Motivation for </B></FONT><CODE><FONT SIZE=2><B>[:</B></FONT></CODE><FONT SIZE=2><B> instead of </B></FONT><CODE><FONT SIZE=2><B>[<</B></FONT></CODE><BR>
1162
It is a question of readability, because of the presence of quotations
1163
in our extended language, whose syntax use many ``less'' and
1164
``greater'' characters. And it is a problem for a list of quoted
1166
</FONT><PRE><FONT SIZE=2>
1167
[<:expr< xx >>; <:expr< yy >>]
1168
</FONT></PRE><FONT SIZE=2>
1169
<B>Motivation for quotes and backquotes</B><BR>
1173
Actually, this should have been done in </FONT><CODE><FONT SIZE=2>OCaml</FONT></CODE><FONT SIZE=2> normal syntax,
1174
since from </FONT><CODE><FONT SIZE=2>Caml Light</FONT></CODE><FONT SIZE=2> to </FONT><CODE><FONT SIZE=2>OCaml</FONT></CODE><FONT SIZE=2>, the character used to
1175
enclose characters changed from backquote into right quote. It would
1176
have been then normal to invert that for the streams terminals, but
1177
it was forgotten.<BR>
1181
In normal syntax, this creates sometimes problems in characters streams:
1182
</FONT><PRE><FONT SIZE=2>
1183
parser [< '('a' | 'b') >] -> ...
1184
</FONT></PRE><FONT SIZE=2>The lexer interprets the first parenthesis as a character, which causes
1185
thus parsing error. You must add a space before the left parenthesis:
1186
</FONT><PRE><FONT SIZE=2>
1187
parser [< ' ('a' | 'b') >] -> ...
1189
<FONT SIZE=2>In revised syntax, which backquotes, this problem does not appear.<BR>
1193
<B>Motivation for closing the syntax of parsers</B><BR>
1197
To resolve the same problem of ``dangling bar'' than for functions,
1198
matches and tries. This syntax is closed the same way.<BR>
1202
<B>Motivation for the empty parser</B><BR>
1206
Useful in initial cases in iterations or initial references values.
1210
<I><FONT COLOR=maroon>
1212
For remarks about Camlp4, write to:
1213
<img src="http://cristal.inria.fr/~ddr/images/email.jpg" alt=email align=top>
1215
<A HREF="tutorial004.html"><IMG SRC ="previous_motif.gif" ALT="Previous"></A>
1216
<A HREF="index.html"><IMG SRC ="contents_motif.gif" ALT="Up"></A>
1217
<A HREF="tutorial006.html"><IMG SRC ="next_motif.gif" ALT="Next"></A>