1
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
4
<title>PyPy[rlib] </title>
5
<meta content="text/html;charset=ISO-8859-1" name="Content-Type"/>
6
<link href="style.css" media="screen" rel="stylesheet" type="text/css"/></head>
8
<div><a><img alt="PyPy" height="110" id="pyimg" src="http://codespeak.net/pypy/img/py-web1.png" width="149"/></a></div>
10
<div class="project_title">PyPy[rlib] </div>
11
<div id="menubar"><a class="menu" href="http://codespeak.net/pypy/dist/pypy/doc/news.html">news</a> <a class="menu" href="getting-started.html">getting-started</a> <a class="menu" href="index.html">documentation</a> <a class="menu" href="https://codespeak.net/viewvc/pypy/dist/">svn</a> <a class="menu" href="https://codespeak.net/issue/pypy-dev/">issues</a><a class="menu" href="contact.html">contact</a> <a class="menu" href="http://pypy.org/">EU/project</a> </div></div>
12
<div id="contentspace">
13
<div id="docinfoline">
14
<div style="float: right; font-style: italic;"> </div></div>
15
<div class="document" id="generally-useful-rpython-modules">
16
<h1 class="title">Generally Useful RPython Modules</h1>
18
<div class="contents topic">
19
<p class="topic-title first"><a id="contents" name="contents">Contents</a></p>
21
<li><a class="reference" href="#listsort" id="id12" name="id12"><tt class="docutils literal"><span class="pre">listsort</span></tt></a></li>
22
<li><a class="reference" href="#nonconst" id="id13" name="id13"><tt class="docutils literal"><span class="pre">nonconst</span></tt></a></li>
23
<li><a class="reference" href="#objectmodel" id="id14" name="id14"><tt class="docutils literal"><span class="pre">objectmodel</span></tt></a></li>
24
<li><a class="reference" href="#rarithmetic" id="id15" name="id15"><tt class="docutils literal"><span class="pre">rarithmetic</span></tt></a></li>
25
<li><a class="reference" href="#rbigint" id="id16" name="id16"><tt class="docutils literal"><span class="pre">rbigint</span></tt></a></li>
26
<li><a class="reference" href="#rrandom" id="id17" name="id17"><tt class="docutils literal"><span class="pre">rrandom</span></tt></a></li>
27
<li><a class="reference" href="#rsocket" id="id18" name="id18"><tt class="docutils literal"><span class="pre">rsocket</span></tt></a></li>
28
<li><a class="reference" href="#rstack" id="id19" name="id19"><tt class="docutils literal"><span class="pre">rstack</span></tt></a></li>
29
<li><a class="reference" href="#streamio" id="id20" name="id20"><tt class="docutils literal"><span class="pre">streamio</span></tt></a></li>
30
<li><a class="reference" href="#unroll" id="id21" name="id21"><tt class="docutils literal"><span class="pre">unroll</span></tt></a></li>
31
<li><a class="reference" href="#parsing" id="id22" name="id22"><tt class="docutils literal"><span class="pre">parsing</span></tt></a><ul>
32
<li><a class="reference" href="#regular-expressions" id="id23" name="id23">Regular Expressions</a></li>
33
<li><a class="reference" href="#ebnf" id="id24" name="id24">EBNF</a></li>
34
<li><a class="reference" href="#parse-trees" id="id25" name="id25">Parse Trees</a><ul>
35
<li><a class="reference" href="#visitors" id="id26" name="id26">Visitors</a></li>
38
<li><a class="reference" href="#tree-transformations" id="id27" name="id27">Tree Transformations</a><ul>
39
<li><a class="reference" href="#symbol-1-symbol-2-symbol-n" id="id28" name="id28">[symbol_1 symbol_2 ... symbol_n]</a></li>
40
<li><a class="reference" href="#symbol" id="id29" name="id29"><symbol></a></li>
41
<li><a class="reference" href="#nonterminal-1-nonterminal-2-nonterminal-n" id="id30" name="id30">>nonterminal_1 nonterminal_2 ... nonterminal_n<</a></li>
44
<li><a class="reference" href="#extensions-to-the-ebnf-grammar-format" id="id31" name="id31">Extensions to the EBNF grammar format</a></li>
45
<li><a class="reference" href="#full-example" id="id32" name="id32">Full Example</a></li>
50
<p>This page lists some of the modules in <a class="reference" href="../../pypy/rlib">pypy/rlib</a> together with some hints
51
for what they can be used for. The modules here will make up some general
52
library useful for RPython programs (since most of the standard library modules
53
are not RPython). Most of these modules are somewhat rough still and are likely
54
to change at some point. Usually it is useful to look at the tests in
55
<a class="reference" href="../../pypy/rlib/test">pypy/rlib/test</a> to get an impression of how to use a module.</p>
57
<h1><a class="toc-backref" href="#id12" id="listsort" name="listsort"><tt class="docutils literal"><span class="pre">listsort</span></tt></a></h1>
58
<p>The <a class="reference" href="../../pypy/rlib/listsort.py">listsort</a> module contains an implementation of the timsort sorting algorithm
59
(the sort method of lists is not RPython). To use it, subclass from the
60
<tt class="docutils literal"><span class="pre">listsort.TimSort</span></tt> class and override the <tt class="docutils literal"><span class="pre">lt</span></tt> method to change the
61
comparison behaviour. The constructor of <tt class="docutils literal"><span class="pre">TimSort</span></tt> takes a list as an
62
argument, which will be sorted in place when the <tt class="docutils literal"><span class="pre">sort</span></tt> method of the
63
<tt class="docutils literal"><span class="pre">TimSort</span></tt> instance is called. <strong>Warning:</strong> currently only one type of list can
64
be sorted using the <tt class="docutils literal"><span class="pre">listsort</span></tt> module in one program, otherwise the annotator
68
<h1><a class="toc-backref" href="#id13" id="nonconst" name="nonconst"><tt class="docutils literal"><span class="pre">nonconst</span></tt></a></h1>
69
<p>The <a class="reference" href="../../pypy/rlib/nonconst.py">nonconst</a> module is useful mostly for tests. The <a class="reference" href="objspace.html#the-flow-object-space">flow object space</a> and
70
the <a class="reference" href="translation.html#the-annotation-pass">annotator</a> do quite some constant folding, which is sometimes not desired
71
in a test. To prevent constant folding on a certain value, use the <tt class="docutils literal"><span class="pre">NonConst</span></tt>
72
class. The constructor of <tt class="docutils literal"><span class="pre">NonConst</span></tt> takes an arbitrary value. The instance of
73
<tt class="docutils literal"><span class="pre">NonConst</span></tt> will behave during annotation like that value, but no constant
74
folding will happen.</p>
77
<h1><a class="toc-backref" href="#id14" id="objectmodel" name="objectmodel"><tt class="docutils literal"><span class="pre">objectmodel</span></tt></a></h1>
78
<p>The <a class="reference" href="../../pypy/rlib/objectmodel.py">objectmodel</a> module is a mixed bag of various functionality. Some of the
79
more useful ones are:</p>
81
<dt><tt class="docutils literal"><span class="pre">ComputedIntSymbolic</span></tt>:</dt>
82
<dd>Instances of <tt class="docutils literal"><span class="pre">ComputedIntSymbolic</span></tt> are treated like integers of unknown
83
value by the annotator. The value is determined by a no-argument function
84
(which needs to be passed into the constructor of the class). When the
85
backend emits code, the function is called to determine the value.</dd>
86
<dt><tt class="docutils literal"><span class="pre">CDefinedIntSymbolic</span></tt>:</dt>
87
<dd>Instances of <tt class="docutils literal"><span class="pre">ComputedIntSymbolic</span></tt> are also treated like integers of
88
unknown value by the annotator. When C code is emitted they will be
89
represented by the attribute <tt class="docutils literal"><span class="pre">expr</span></tt> of the symbolic (which is also the
90
first argument of the constructor).</dd>
91
<dt><tt class="docutils literal"><span class="pre">r_dict</span></tt>:</dt>
92
<dd>An RPython dict-like object. The constructor of r_dict takes two functions:
93
<tt class="docutils literal"><span class="pre">key_eq</span></tt> and <tt class="docutils literal"><span class="pre">key_hash</span></tt> which are used for comparing and hashing the
94
entries in the dictionary.</dd>
95
<dt><tt class="docutils literal"><span class="pre">instantiate(cls)</span></tt>:</dt>
96
<dd>Instantiate class <tt class="docutils literal"><span class="pre">cls</span></tt> without calling <tt class="docutils literal"><span class="pre">__init__</span></tt>.</dd>
97
<dt><tt class="docutils literal"><span class="pre">we_are_translated()</span></tt>:</dt>
98
<dd>This function returns <tt class="docutils literal"><span class="pre">False</span></tt> when run on top of CPython, but the
99
annotator thinks its return value is <tt class="docutils literal"><span class="pre">True</span></tt>. Therefore it can be used to
100
do different things on top of CPython than after translation. This should be
101
used extremely sparingly (mostly for optimizations or debug code).</dd>
102
<dt><tt class="docutils literal"><span class="pre">cast_object_to_weakaddress(obj)</span></tt>:</dt>
103
<dd>Returns a sort of "weak reference" to obj, just without any convenience. The
104
weak address that it returns is not invalidated if the object dies, so you
105
need to take care yourself to know when the object dies. Use with extreme
107
<dt><tt class="docutils literal"><span class="pre">cast_weakadress_to_object(obj)</span></tt>:</dt>
108
<dd>Inverse of the previous function. If the object died then a segfault will
110
<dt><tt class="docutils literal"><span class="pre">UnboxedValue</span></tt>:</dt>
111
<dd>This is a class which should be used as a base class for a class which
112
carries exactly one integer field. The class should have <tt class="docutils literal"><span class="pre">__slots__</span></tt>
113
with exactly one entry defined. After translation, instances of this class
114
won't be allocated but represented by <em>tagged pointers*</em>, that is pointers
115
that have the lowest bit set.</dd>
118
<div class="section">
119
<h1><a class="toc-backref" href="#id15" id="rarithmetic" name="rarithmetic"><tt class="docutils literal"><span class="pre">rarithmetic</span></tt></a></h1>
120
<p>The <a class="reference" href="../../pypy/rlib/rarithmetic.py">rarithmetic</a> module contains functionality to handle the small differences
121
in the behaviour of arithmetic code in regular Python and RPython code. Most of
122
them are already described in the <a class="reference" href="coding-guide.html">coding guide</a></p>
124
<div class="section">
125
<h1><a class="toc-backref" href="#id16" id="rbigint" name="rbigint"><tt class="docutils literal"><span class="pre">rbigint</span></tt></a></h1>
126
<p>The rbigint module contains a full RPython implementation of the Python <tt class="docutils literal"><span class="pre">long</span></tt>
127
type (which itself is not supported in RPython). The <tt class="docutils literal"><span class="pre">rbigint</span></tt> class contains
128
that implementation. To construct <tt class="docutils literal"><span class="pre">rbigint</span></tt> instances use the static methods
129
<tt class="docutils literal"><span class="pre">fromint</span></tt>, <tt class="docutils literal"><span class="pre">frombool</span></tt>, <tt class="docutils literal"><span class="pre">fromfloat</span></tt> and <tt class="docutils literal"><span class="pre">fromdecimalstr</span></tt>. To convert back
130
to other types use the methods <tt class="docutils literal"><span class="pre">toint</span></tt>, <tt class="docutils literal"><span class="pre">tobool</span></tt>, <tt class="docutils literal"><span class="pre">touint</span></tt> and
131
<tt class="docutils literal"><span class="pre">tofloat</span></tt>. Since RPython does not support operator overloading, all the
132
special methods of <tt class="docutils literal"><span class="pre">rbigint</span></tt> that would normally start and end with "__" have
133
these underscores left out for better readability (so <tt class="docutils literal"><span class="pre">a.add(b)</span></tt> can be used
134
to add two rbigint instances).</p>
136
<div class="section">
137
<h1><a class="toc-backref" href="#id17" id="rrandom" name="rrandom"><tt class="docutils literal"><span class="pre">rrandom</span></tt></a></h1>
138
<p>The <a class="reference" href="../../pypy/rlib/rrandom.py">rrandom</a> module contains an implementation of the mersenne twister random
139
number generator. It contains one class <tt class="docutils literal"><span class="pre">Random</span></tt> which most importantly has a
140
<tt class="docutils literal"><span class="pre">random</span></tt> method which returns a pseudo-random floating point number between
143
<div class="section">
144
<h1><a class="toc-backref" href="#id18" id="rsocket" name="rsocket"><tt class="docutils literal"><span class="pre">rsocket</span></tt></a></h1>
145
<p>The <a class="reference" href="../../pypy/rlib/rsocket.py">rsocket</a> module contains an RPython implementation of the functionality of
146
the socket standard library with a slightly different interface. The
147
difficulty with the Python socket API is that addresses are not "well-typed"
148
objects: dependending on the address family they are tuples, or strings, and
149
so on, which is not suitable for RPython. Instead, <tt class="docutils literal"><span class="pre">rsocket</span></tt> contains
150
a hierarchy of Address classes, in a typical static-OO-programming style.</p>
152
<div class="section">
153
<h1><a class="toc-backref" href="#id19" id="rstack" name="rstack"><tt class="docutils literal"><span class="pre">rstack</span></tt></a></h1>
154
<p>The <a class="reference" href="../../pypy/rlib/rstack.py">rstack</a> module allows an RPython progam to control its own execution stack.
155
This is only useful if the program is translated using stackless. An old
156
description of the exposed functions is below.</p>
157
<p>We introduce an RPython type <tt class="docutils literal"><span class="pre">frame_stack_top</span></tt> and a built-in function
158
<tt class="docutils literal"><span class="pre">yield_current_frame_to_caller()</span></tt> that work as follows (see example below):</p>
160
<li>The built-in function <tt class="docutils literal"><span class="pre">yield_current_frame_to_caller()</span></tt> causes the current
161
function's state to be captured in a new <tt class="docutils literal"><span class="pre">frame_stack_top</span></tt> object that is
162
returned to the parent. Only one frame, the current one, is captured this
163
way. The current frame is suspended and the caller continues to run. Note
164
that the caller is only resumed once: when
165
<tt class="docutils literal"><span class="pre">yield_current_frame_to_caller()</span></tt> is called. See below.</li>
166
<li>A <tt class="docutils literal"><span class="pre">frame_stack_top</span></tt> object can be jumped to by calling its <tt class="docutils literal"><span class="pre">switch()</span></tt>
167
method with no argument.</li>
168
<li><tt class="docutils literal"><span class="pre">yield_current_frame_to_caller()</span></tt> and <tt class="docutils literal"><span class="pre">switch()</span></tt> themselves return a new
169
<tt class="docutils literal"><span class="pre">frame_stack_top</span></tt> object: the freshly captured state of the caller of the
170
source <tt class="docutils literal"><span class="pre">switch()</span></tt> that was just executed, or None in the case described
172
<li>the function that called <tt class="docutils literal"><span class="pre">yield_current_frame_to_caller()</span></tt> also has a
173
normal return statement, like all functions. This statement must return
174
another <tt class="docutils literal"><span class="pre">frame_stack_top</span></tt> object. The latter is <em>not</em> returned to the
175
original caller; there is no way to return several times to the caller.
176
Instead, it designates the place to which the execution must jump, as if by
177
a <tt class="docutils literal"><span class="pre">switch()</span></tt>. The place to which we jump this way will see a None as the
178
source frame stack top.</li>
179
<li>every frame stack top must be resumed once and only once. Not resuming
180
it at all causes a leak. Resuming it several times causes a crash.</li>
181
<li>a function that called <tt class="docutils literal"><span class="pre">yield_current_frame_to_caller()</span></tt> should not raise.
182
It would have no implicit parent frame to propagate the exception to. That
183
would be a crashingly bad idea.</li>
185
<p>The following example would print the numbers from 1 to 7 in order:</p>
186
<pre class="literal-block">
189
frametop_before_5 = yield_current_frame_to_caller()
191
frametop_before_7 = frametop_before_5.switch()
193
return frametop_before_7
197
frametop_before_4 = g()
199
frametop_before_6 = frametop_before_4.switch()
201
frametop_after_return = frametop_before_6.switch()
203
assert frametop_after_return is None
208
<div class="section">
209
<h1><a class="toc-backref" href="#id20" id="streamio" name="streamio"><tt class="docutils literal"><span class="pre">streamio</span></tt></a></h1>
210
<p>The <a class="reference" href="../../pypy/rlib/streamio.py">streamio</a> contains an RPython stream I/O implementation (which was started
211
by Guido van Rossum as <a class="reference" href="http://svn.python.org/view/sandbox/trunk/sio/sio.py">sio.py</a> in the CPython sandbox as a prototype for the
212
upcoming new file implementation in Python 3000).</p>
214
<div class="section">
215
<h1><a class="toc-backref" href="#id21" id="unroll" name="unroll"><tt class="docutils literal"><span class="pre">unroll</span></tt></a></h1>
216
<p>The <a class="reference" href="../../pypy/rlib/unroll.py">unroll</a> module most importantly contains the function <tt class="docutils literal"><span class="pre">unrolling_iterable</span></tt>
217
which wraps an iterator. Looping over the iterator in RPython code will not
218
produce a loop in the resulting flow graph but will unroll the loop instead.</p>
220
<div class="section">
221
<h1><a class="toc-backref" href="#id22" id="parsing" name="parsing"><tt class="docutils literal"><span class="pre">parsing</span></tt></a></h1>
222
<p>The <a class="reference" href="../../pypy/rlib/parsing/">parsing</a> module is a still in-development module to generate tokenizers and
223
parsers in RPython. It is still highly experimental and only really used by the
224
<a class="reference" href="../../pypy/lang/prolog/">Prolog interpreter</a> (although in slightly non-standard ways). The easiest way
225
to specify a tokenizer/grammar is to write it down using regular expressions and
226
simple EBNF format.</p>
227
<p>The regular expressions are implemented using finite automatons. The parsing
228
engine uses <a class="reference" href="http://pdos.csail.mit.edu/~baford/packrat/">packrat parsing</a>, which has O(n) parsing time but is more
229
powerful than LL(n) and LR(n) grammars.</p>
230
<div class="section">
231
<h2><a class="toc-backref" href="#id23" id="regular-expressions" name="regular-expressions">Regular Expressions</a></h2>
232
<p>The regular expression syntax is mostly a subset of the syntax of the <a class="reference" href="http://docs.python.org/lib/module-re.html">re</a>
233
module. By default, non-special characters match themselves. If you concatenate
234
regular expressions the result will match the concatenation of strings matched
235
by the single regular expressions.</p>
236
<dl class="docutils">
237
<dt><tt class="docutils literal"><span class="pre">|</span></tt></dt>
238
<dd><tt class="docutils literal"><span class="pre">R|S</span></tt> matches any string that <em>either</em> matches R or matches S.</dd>
239
<dt><tt class="docutils literal"><span class="pre">*</span></tt></dt>
240
<dd><tt class="docutils literal"><span class="pre">R*</span></tt> matches 0 or more repetitions of R.</dd>
241
<dt><tt class="docutils literal"><span class="pre">+</span></tt></dt>
242
<dd><tt class="docutils literal"><span class="pre">R+</span></tt> matches 1 or more repetitions of R.</dd>
243
<dt><tt class="docutils literal"><span class="pre">?</span></tt></dt>
244
<dd><tt class="docutils literal"><span class="pre">R?</span></tt> matches 0 or 1 repetition of R.</dd>
245
<dt><tt class="docutils literal"><span class="pre">(...)</span></tt></dt>
246
<dd>Parenthesis can be used to group regular expressions (note that in constrast
247
to Python's re module you cannot later match the content of this group).</dd>
248
<dt><tt class="docutils literal"><span class="pre">{m}</span></tt></dt>
249
<dd><tt class="docutils literal"><span class="pre">R{m}</span></tt> matches exactly m repetitions of R.</dd>
250
<dt><tt class="docutils literal"><span class="pre">{m,</span> <span class="pre">n}</span></tt></dt>
251
<dd><tt class="docutils literal"><span class="pre">R{m,</span> <span class="pre">n}</span></tt> matches between m and n repetitions of R (including m and n).</dd>
252
<dt><tt class="docutils literal"><span class="pre">[]</span></tt></dt>
253
<dd>Matches a set of characters. The characters to be matched can be listed
254
sequentially. A range of characters can be specified using <tt class="docutils literal"><span class="pre">-</span></tt>. For
255
examples <tt class="docutils literal"><span class="pre">[ac-eg]</span></tt> matches the characters a, c, d, e and g.
256
The whole set can be inverted by starting it with <tt class="docutils literal"><span class="pre">^</span></tt>. So [^a] matches
257
anything except a.</dd>
259
<p>To parse a regular expression and to get a matcher for it, you can use the
260
function <tt class="docutils literal"><span class="pre">make_runner(s)</span></tt> in the <tt class="docutils literal"><span class="pre">pypy.rlib.parsing.parseregex</span></tt> module. It
261
returns a object with a <tt class="docutils literal"><span class="pre">recognize(input)</span></tt> method that returns True or False
262
depending on whether <tt class="docutils literal"><span class="pre">input</span></tt> matches the string or not.</p>
264
<div class="section">
265
<h2><a class="toc-backref" href="#id24" id="ebnf" name="ebnf">EBNF</a></h2>
266
<p>To describe a tokenizer and a grammar the <tt class="docutils literal"><span class="pre">pypy.rlib.parsing.ebnfparse</span></tt>
267
defines a syntax for doing that.</p>
268
<p>The syntax file contains a sequence or rules. Every rule either describes a
269
regular expression or a grammar rule.</p>
270
<p>Regular expressions rules have the form:</p>
271
<pre class="literal-block">
272
NAME: "regex";
274
<p>NAME is the name of the token that the regular expression
275
produces (it has to consist of upper-case letters), <tt class="docutils literal"><span class="pre">regex</span></tt> is a regular
276
expression with the syntax described above. One token name is special-cased: a
277
token called <tt class="docutils literal"><span class="pre">IGNORE</span></tt> will be filtered out of the token stream before being
278
passed on to the parser and can thus be used to match comments or
279
non-significant whitespace.</p>
280
<p>Grammar rules have the form:</p>
281
<pre class="literal-block">
282
name: expansion_1 | expansion_2 | ... | expansion_n;
284
<p>Where <tt class="docutils literal"><span class="pre">expansion_i</span></tt> is a sequence of nonterminal or token names:</p>
285
<pre class="literal-block">
286
symbol_1 symbol_2 symbol_3 ... symbol_n
288
<p>This means that the nonterminal symbol <tt class="docutils literal"><span class="pre">name</span></tt> (which has to consist of
289
lower-case letters) can be expanded into any of the expansions. The expansions
290
can consist of a sequence of token names, nonterminal names or literals, which
291
are strings in quotes that are matched literally.</p>
292
<p>An example to make this clearer:</p>
293
<pre class="literal-block">
294
IGNORE: " ";
295
DECIMAL: "0|[1-9][0-9]*";
296
additive: multitive "+" additive |
298
multitive: primary "*" multitive |
300
primary: "(" additive ")" | DECIMAL;
302
<p>This grammar describes the syntax of arithmetic impressions involving addition
303
and multiplication. The tokenizer
304
produces a stream of either DECIMAL tokens or tokens that have matched one of
305
the literals "+", "*", "(" or ")". Any space will be ignored. The grammar
306
produces a syntax tree that follows the precedence of the operators. For example
307
the expression <tt class="docutils literal"><span class="pre">12</span> <span class="pre">+</span> <span class="pre">4</span> <span class="pre">*</span> <span class="pre">5</span></tt> is parsed into the following tree:</p>
308
<img alt="image/parsing_example1.png" src="image/parsing_example1.png" />
310
<div class="section">
311
<h2><a class="toc-backref" href="#id25" id="parse-trees" name="parse-trees">Parse Trees</a></h2>
312
<p>The parsing process builds up a tree consisting of intances of <tt class="docutils literal"><span class="pre">Symbol</span></tt> and
313
<tt class="docutils literal"><span class="pre">Nonterminal</span></tt>, the former corresponding to tokens, the latter to nonterminal
314
symbols. Both classes live in the <a class="reference" href="../../pypy/rlib/parsing/tree.py">pypy.rlib.parsing.tree</a> module. You can use
315
the <tt class="docutils literal"><span class="pre">view()</span></tt> method <tt class="docutils literal"><span class="pre">Nonterminal</span></tt> instances to get a pygame view of the
317
<p><tt class="docutils literal"><span class="pre">Symbol</span></tt> instances have the following attributes: <tt class="docutils literal"><span class="pre">symbol</span></tt>, which is the
318
name of the token and <tt class="docutils literal"><span class="pre">additional_info</span></tt> which is the matched source.</p>
319
<p><tt class="docutils literal"><span class="pre">Nonterminal</span></tt> instances have the following attributes: <tt class="docutils literal"><span class="pre">symbol</span></tt> is the name
320
of the nonterminal and <tt class="docutils literal"><span class="pre">children</span></tt> which is a list of the children attributes.</p>
321
<div class="section">
322
<h3><a class="toc-backref" href="#id26" id="visitors" name="visitors">Visitors</a></h3>
323
<p>To write tree visitors for the parse trees that are RPython, there is a special
324
baseclass <tt class="docutils literal"><span class="pre">RPythonVisitor</span></tt> in <tt class="docutils literal"><span class="pre">pypy.rlib.parsing.tree``_</span> <span class="pre">to</span> <span class="pre">use.</span> <span class="pre">If</span> <span class="pre">your</span>
325
<span class="pre">class</span> <span class="pre">uses</span> <span class="pre">this,</span> <span class="pre">it</span> <span class="pre">will</span> <span class="pre">grow</span> <span class="pre">a</span> <span class="pre">``dispatch(node)</span></tt> method, that calls an
326
appropriate <tt class="docutils literal"><span class="pre">visit_<symbol></span></tt> method, depending on the <tt class="docutils literal"><span class="pre">node</span></tt> argument. Here
327
the <symbol> is replaced by the <tt class="docutils literal"><span class="pre">symbol</span></tt> attribute of the visited node.</p>
328
<p>For the visitor to be RPython, the return values of all the visit methods need
329
to be of the same type.</p>
332
<div class="section">
333
<h2><a class="toc-backref" href="#id27" id="tree-transformations" name="tree-transformations">Tree Transformations</a></h2>
334
<p>As the tree of arithmetic example above shows, by default the parse tree
335
contains a lot of nodes that are not really conveying useful information.
336
To get rid of some of them, there is some support in the grammar format to
337
automatically create a visitor that transforms the tree to remove the additional
338
nodes. The simplest such transformation just removes nodes, but there are
339
more complex ones.</p>
340
<p>The syntax for these transformations is to enclose symbols in expansions of a
341
nonterminal by [...], <...> or >...<.</p>
342
<div class="section">
343
<h3><a class="toc-backref" href="#id28" id="symbol-1-symbol-2-symbol-n" name="symbol-1-symbol-2-symbol-n">[symbol_1 symbol_2 ... symbol_n]</a></h3>
344
<p>This will produce a transformer that completely removes the enclosed symbols
347
<pre class="literal-block">
348
IGNORE: " ";
349
n: "A" [","] n | "A";
351
<p>Parsing the string "A, A, A" gives the tree:</p>
352
<img alt="image/parsing_example2.png" src="image/parsing_example2.png" />
353
<p>After transformation the tree has the "," nodes removed:</p>
354
<img alt="image/parsing_example3.png" src="image/parsing_example3.png" />
356
<div class="section">
357
<h3><a class="toc-backref" href="#id29" id="symbol" name="symbol"><symbol></a></h3>
358
<p>This will replace the parent with symbol. Every expansion can contain at most
359
one symbol that is enclosed by <...>, because the parent can only be replaced
362
<pre class="literal-block">
363
IGNORE: " ";
364
n: "a" "b" "c" m;
365
m: "(" <n> ")" | "d";
367
<p>Parsing the string "a b c (a b c d)" gives the tree:</p>
368
<img alt="image/parsing_example4.png" src="image/parsing_example4.png" />
369
<p>After transformation the tree looks like this:</p>
370
<img alt="image/parsing_example5.png" src="image/parsing_example5.png" />
372
<div class="section">
373
<h3><a class="toc-backref" href="#id30" id="nonterminal-1-nonterminal-2-nonterminal-n" name="nonterminal-1-nonterminal-2-nonterminal-n">>nonterminal_1 nonterminal_2 ... nonterminal_n<</a></h3>
374
<p>This replaces the nodes nonterminal_1 to nonterminal_n by their children.</p>
376
<pre class="literal-block">
377
IGNORE: " ";
378
DECIMAL: "0|[1-9][0-9]*";
379
list: DECIMAL >list< | DECIMAL;
381
<p>Parsing the string "1 2" gives the tree:</p>
382
<img alt="image/parsing_example6.png" src="image/parsing_example6.png" />
383
<p>after the transformation the tree looks like:</p>
384
<img alt="image/parsing_example7.png" src="image/parsing_example7.png" />
385
<p>Note that the transformation works recursively. That means that the following
386
also works: if the string "1 2 3 4 5" is parsed the tree at first looks like
388
<img alt="image/parsing_example8.png" src="image/parsing_example8.png" />
389
<p>But after transformation the whole thing collapses to one node with a lot of
391
<img alt="image/parsing_example9.png" src="image/parsing_example9.png" />
394
<div class="section">
395
<h2><a class="toc-backref" href="#id31" id="extensions-to-the-ebnf-grammar-format" name="extensions-to-the-ebnf-grammar-format">Extensions to the EBNF grammar format</a></h2>
396
<p>There are some extensions to the EBNF grammar format that are really only
397
syntactic sugar but make writing grammars less tedious. These are:</p>
398
<dl class="docutils">
399
<dt><tt class="docutils literal"><span class="pre">symbol?</span></tt>:</dt>
400
<dd>matches 0 or 1 repetitions of symbol</dd>
401
<dt><tt class="docutils literal"><span class="pre">symbol*</span></tt>:</dt>
402
<dd>matches 0 or more repetitions of symbol. After the tree transformation all
403
these repetitions are children of the current symbol.</dd>
404
<dt><tt class="docutils literal"><span class="pre">symbol+</span></tt>:</dt>
405
<dd>matches 1 or more repetitions of symbol. After the tree transformation all
406
these repetitions are children of the current symbol.</dd>
408
<p>These are implemented by adding some more rules to the grammar in the correct
409
way. Examples: the grammar:</p>
410
<pre class="literal-block">
413
<p>is transformed to look like this:</p>
414
<pre class="literal-block">
415
s: a >_maybe_symbol_0_< c | a c;
419
<pre class="literal-block">
422
<p>is transformed to look like this:</p>
423
<pre class="literal-block">
424
s: a >_star_symbol_0< c | a c;
425
_star_symbol_0: b >_symbol_star_0< | b;
428
<pre class="literal-block">
431
<p>is transformed to look like this:</p>
432
<pre class="literal-block">
433
s: a >_plus_symbol_0< c;
434
_plus_symbol_0: b >_plus_symbol_0< | b;
437
<div class="section">
438
<h2><a class="toc-backref" href="#id32" id="full-example" name="full-example">Full Example</a></h2>
439
<p>A semi-complete parser for the <a class="reference" href="http://www.json.org">json format</a>:</p>
440
<pre class="literal-block">
441
STRING: "\\"[^\\\\"]*\\"";
442
NUMBER: "\-?(0|[1-9][0-9]*)(\.[0-9]+)?([eE][\+\-]?[0-9]+)?";
443
IGNORE: " |\n";
444
value: <STRING> | <NUMBER> | <object> | <array> | <"null"> |
445
<"true"> | <"false">;
446
object: ["{"] (entry [","])* entry ["}"];
447
array: ["["] (value [","])* value ["]"];
448
entry: STRING [":"] value;
450
<p>The resulting tree for parsing the string:</p>
451
<pre class="literal-block">
452
{"a": "5", "b": [1, null, 3, true, {"f": "g", "h": 6}]}
454
<p>looks like this:</p>
455
<img alt="image/parsing_example10.png" src="image/parsing_example10.png" />
b'\\ No newline at end of file'