~ubuntu-branches/ubuntu/karmic/pypy/karmic

<div id="menubar"><a class="menu" href="http://codespeak.net/pypy/dist/pypy/doc/news.html">news</a> <a class="menu" href="getting-started.html">getting-started</a> <a class="menu" href="index.html">documentation</a> <a class="menu" href="https://codespeak.net/viewvc/pypy/dist/">svn</a> <a class="menu" href="https://codespeak.net/issue/pypy-dev/">issues</a><a class="menu" href="contact.html">contact</a> <a class="menu" href="http://pypy.org/">EU/project</a> </div></div>

<h1 class="title">Generally Useful RPython Modules</h1>

<p class="topic-title first"><a id="contents" name="contents">Contents</a></p>

<li><a class="reference" href="#listsort" id="id12" name="id12"><tt class="docutils literal"><span class="pre">listsort</span></tt></a></li>

<li><a class="reference" href="#nonconst" id="id13" name="id13"><tt class="docutils literal"><span class="pre">nonconst</span></tt></a></li>

<li><a class="reference" href="#objectmodel" id="id14" name="id14"><tt class="docutils literal"><span class="pre">objectmodel</span></tt></a></li>

<li><a class="reference" href="#rarithmetic" id="id15" name="id15"><tt class="docutils literal"><span class="pre">rarithmetic</span></tt></a></li>

<li><a class="reference" href="#rbigint" id="id16" name="id16"><tt class="docutils literal"><span class="pre">rbigint</span></tt></a></li>

<li><a class="reference" href="#rrandom" id="id17" name="id17"><tt class="docutils literal"><span class="pre">rrandom</span></tt></a></li>

<li><a class="reference" href="#rsocket" id="id18" name="id18"><tt class="docutils literal"><span class="pre">rsocket</span></tt></a></li>

<li><a class="reference" href="#rstack" id="id19" name="id19"><tt class="docutils literal"><span class="pre">rstack</span></tt></a></li>

<li><a class="reference" href="#streamio" id="id20" name="id20"><tt class="docutils literal"><span class="pre">streamio</span></tt></a></li>

<li><a class="reference" href="#unroll" id="id21" name="id21"><tt class="docutils literal"><span class="pre">unroll</span></tt></a></li>

<li><a class="reference" href="#parsing" id="id22" name="id22"><tt class="docutils literal"><span class="pre">parsing</span></tt></a><ul>

<li><a class="reference" href="#regular-expressions" id="id23" name="id23">Regular Expressions</a></li>

<li><a class="reference" href="#parse-trees" id="id25" name="id25">Parse Trees</a><ul>

<li><a class="reference" href="#visitors" id="id26" name="id26">Visitors</a></li>

</ul>

</li>

<li><a class="reference" href="#tree-transformations" id="id27" name="id27">Tree Transformations</a><ul>

<li><a class="reference" href="#symbol-1-symbol-2-symbol-n" id="id28" name="id28">[symbol_1 symbol_2 ... symbol_n]</a></li>

<li><a class="reference" href="#nonterminal-1-nonterminal-2-nonterminal-n" id="id30" name="id30">>nonterminal_1 nonterminal_2 ... nonterminal_n<</a></li>

</ul>

</li>

<li><a class="reference" href="#extensions-to-the-ebnf-grammar-format" id="id31" name="id31">Extensions to the EBNF grammar format</a></li>

<li><a class="reference" href="#full-example" id="id32" name="id32">Full Example</a></li>

</ul>

</li>

</ul>

</div>

<p>This page lists some of the modules in <a class="reference" href="../../pypy/rlib">pypy/rlib</a> together with some hints

for what they can be used for. The modules here will make up some general

library useful for RPython programs (since most of the standard library modules

are not RPython). Most of these modules are somewhat rough still and are likely

to change at some point. Usually it is useful to look at the tests in

<a class="reference" href="../../pypy/rlib/test">pypy/rlib/test</a> to get an impression of how to use a module.</p>

<h1><a class="toc-backref" href="#id12" id="listsort" name="listsort"><tt class="docutils literal"><span class="pre">listsort</span></tt></a></h1>

<p>The <a class="reference" href="../../pypy/rlib/listsort.py">listsort</a> module contains an implementation of the timsort sorting algorithm

(the sort method of lists is not RPython). To use it, subclass from the

<tt class="docutils literal"><span class="pre">listsort.TimSort</span></tt> class and override the <tt class="docutils literal"><span class="pre">lt</span></tt> method to change the

comparison behaviour. The constructor of <tt class="docutils literal"><span class="pre">TimSort</span></tt> takes a list as an

argument, which will be sorted in place when the <tt class="docutils literal"><span class="pre">sort</span></tt> method of the

<tt class="docutils literal"><span class="pre">TimSort</span></tt> instance is called. <strong>Warning:</strong> currently only one type of list can

be sorted using the <tt class="docutils literal"><span class="pre">listsort</span></tt> module in one program, otherwise the annotator

will be confused.</p>

</div>

<h1><a class="toc-backref" href="#id13" id="nonconst" name="nonconst"><tt class="docutils literal"><span class="pre">nonconst</span></tt></a></h1>

<p>The <a class="reference" href="../../pypy/rlib/nonconst.py">nonconst</a> module is useful mostly for tests. The <a class="reference" href="objspace.html#the-flow-object-space">flow object space</a> and

the <a class="reference" href="translation.html#the-annotation-pass">annotator</a> do quite some constant folding, which is sometimes not desired

in a test. To prevent constant folding on a certain value, use the <tt class="docutils literal"><span class="pre">NonConst</span></tt>

class. The constructor of <tt class="docutils literal"><span class="pre">NonConst</span></tt> takes an arbitrary value. The instance of

<tt class="docutils literal"><span class="pre">NonConst</span></tt> will behave during annotation like that value, but no constant

folding will happen.</p>

</div>

<h1><a class="toc-backref" href="#id14" id="objectmodel" name="objectmodel"><tt class="docutils literal"><span class="pre">objectmodel</span></tt></a></h1>

<p>The <a class="reference" href="../../pypy/rlib/objectmodel.py">objectmodel</a> module is a mixed bag of various functionality. Some of the

more useful ones are:</p>

<dt><tt class="docutils literal"><span class="pre">ComputedIntSymbolic</span></tt>:</dt>

<dd>Instances of <tt class="docutils literal"><span class="pre">ComputedIntSymbolic</span></tt> are treated like integers of unknown

value by the annotator. The value is determined by a no-argument function

(which needs to be passed into the constructor of the class). When the

backend emits code, the function is called to determine the value.</dd>

<dt><tt class="docutils literal"><span class="pre">CDefinedIntSymbolic</span></tt>:</dt>

<dd>Instances of <tt class="docutils literal"><span class="pre">ComputedIntSymbolic</span></tt> are also treated like integers of

unknown value by the annotator. When C code is emitted they will be

represented by the attribute <tt class="docutils literal"><span class="pre">expr</span></tt> of the symbolic (which is also the

first argument of the constructor).</dd>

<dd>An RPython dict-like object. The constructor of r_dict takes two functions:

<tt class="docutils literal"><span class="pre">key_eq</span></tt> and <tt class="docutils literal"><span class="pre">key_hash</span></tt> which are used for comparing and hashing the

entries in the dictionary.</dd>

<dt><tt class="docutils literal"><span class="pre">instantiate(cls)</span></tt>:</dt>

<dd>Instantiate class <tt class="docutils literal"><span class="pre">cls</span></tt> without calling <tt class="docutils literal"><span class="pre">__init__</span></tt>.</dd>

<dt><tt class="docutils literal"><span class="pre">we_are_translated()</span></tt>:</dt>

<dd>This function returns <tt class="docutils literal"><span class="pre">False</span></tt> when run on top of CPython, but the

annotator thinks its return value is <tt class="docutils literal"><span class="pre">True</span></tt>. Therefore it can be used to

100

do different things on top of CPython than after translation. This should be

101

used extremely sparingly (mostly for optimizations or debug code).</dd>

102

<dt><tt class="docutils literal"><span class="pre">cast_object_to_weakaddress(obj)</span></tt>:</dt>

103

<dd>Returns a sort of "weak reference" to obj, just without any convenience. The

104

weak address that it returns is not invalidated if the object dies, so you

105

need to take care yourself to know when the object dies. Use with extreme

106

care.</dd>

107

<dt><tt class="docutils literal"><span class="pre">cast_weakadress_to_object(obj)</span></tt>:</dt>

108

<dd>Inverse of the previous function. If the object died then a segfault will

109

ensue.</dd>

110

<dt><tt class="docutils literal"><span class="pre">UnboxedValue</span></tt>:</dt>

111

<dd>This is a class which should be used as a base class for a class which

112

carries exactly one integer field. The class should have <tt class="docutils literal"><span class="pre">__slots__</span></tt>

113

with exactly one entry defined. After translation, instances of this class

114

won't be allocated but represented by <em>tagged pointers*</em>, that is pointers

115

that have the lowest bit set.</dd>

116

</dl>

117

</div>

118

119

<h1><a class="toc-backref" href="#id15" id="rarithmetic" name="rarithmetic"><tt class="docutils literal"><span class="pre">rarithmetic</span></tt></a></h1>

120

<p>The <a class="reference" href="../../pypy/rlib/rarithmetic.py">rarithmetic</a> module contains functionality to handle the small differences

121

in the behaviour of arithmetic code in regular Python and RPython code. Most of

122

them are already described in the <a class="reference" href="coding-guide.html">coding guide</a></p>

123

</div>

124

125

<h1><a class="toc-backref" href="#id16" id="rbigint" name="rbigint"><tt class="docutils literal"><span class="pre">rbigint</span></tt></a></h1>

126

<p>The rbigint module contains a full RPython implementation of the Python <tt class="docutils literal"><span class="pre">long</span></tt>

127

type (which itself is not supported in RPython). The <tt class="docutils literal"><span class="pre">rbigint</span></tt> class contains

128

that implementation. To construct <tt class="docutils literal"><span class="pre">rbigint</span></tt> instances use the static methods

129

<tt class="docutils literal"><span class="pre">fromint</span></tt>, <tt class="docutils literal"><span class="pre">frombool</span></tt>, <tt class="docutils literal"><span class="pre">fromfloat</span></tt> and <tt class="docutils literal"><span class="pre">fromdecimalstr</span></tt>. To convert back

130

to other types use the methods <tt class="docutils literal"><span class="pre">toint</span></tt>, <tt class="docutils literal"><span class="pre">tobool</span></tt>, <tt class="docutils literal"><span class="pre">touint</span></tt> and

131

<tt class="docutils literal"><span class="pre">tofloat</span></tt>. Since RPython does not support operator overloading, all the

132

special methods of <tt class="docutils literal"><span class="pre">rbigint</span></tt> that would normally start and end with "__" have

133

these underscores left out for better readability (so <tt class="docutils literal"><span class="pre">a.add(b)</span></tt> can be used

134

to add two rbigint instances).</p>

135

</div>

136

137

<h1><a class="toc-backref" href="#id17" id="rrandom" name="rrandom"><tt class="docutils literal"><span class="pre">rrandom</span></tt></a></h1>

138

<p>The <a class="reference" href="../../pypy/rlib/rrandom.py">rrandom</a> module contains an implementation of the mersenne twister random

139

number generator. It contains one class <tt class="docutils literal"><span class="pre">Random</span></tt> which most importantly has a

140

<tt class="docutils literal"><span class="pre">random</span></tt> method which returns a pseudo-random floating point number between

141

0.0 and 1.0.</p>

142

</div>

143

144

<h1><a class="toc-backref" href="#id18" id="rsocket" name="rsocket"><tt class="docutils literal"><span class="pre">rsocket</span></tt></a></h1>

145

<p>The <a class="reference" href="../../pypy/rlib/rsocket.py">rsocket</a> module contains an RPython implementation of the functionality of

146

the socket standard library with a slightly different interface. The

147

difficulty with the Python socket API is that addresses are not "well-typed"

148

objects: dependending on the address family they are tuples, or strings, and

149

so on, which is not suitable for RPython. Instead, <tt class="docutils literal"><span class="pre">rsocket</span></tt> contains

150

a hierarchy of Address classes, in a typical static-OO-programming style.</p>

151

</div>

152

153

<h1><a class="toc-backref" href="#id19" id="rstack" name="rstack"><tt class="docutils literal"><span class="pre">rstack</span></tt></a></h1>

154

<p>The <a class="reference" href="../../pypy/rlib/rstack.py">rstack</a> module allows an RPython progam to control its own execution stack.

155

This is only useful if the program is translated using stackless. An old

156

description of the exposed functions is below.</p>

157

<p>We introduce an RPython type <tt class="docutils literal"><span class="pre">frame_stack_top</span></tt> and a built-in function

158

<tt class="docutils literal"><span class="pre">yield_current_frame_to_caller()</span></tt> that work as follows (see example below):</p>

159

160

<li>The built-in function <tt class="docutils literal"><span class="pre">yield_current_frame_to_caller()</span></tt> causes the current

161

function's state to be captured in a new <tt class="docutils literal"><span class="pre">frame_stack_top</span></tt> object that is

162

returned to the parent. Only one frame, the current one, is captured this

163

way. The current frame is suspended and the caller continues to run. Note

164

that the caller is only resumed once: when

165

<tt class="docutils literal"><span class="pre">yield_current_frame_to_caller()</span></tt> is called. See below.</li>

166

<li>A <tt class="docutils literal"><span class="pre">frame_stack_top</span></tt> object can be jumped to by calling its <tt class="docutils literal"><span class="pre">switch()</span></tt>

167

method with no argument.</li>

168

<li><tt class="docutils literal"><span class="pre">yield_current_frame_to_caller()</span></tt> and <tt class="docutils literal"><span class="pre">switch()</span></tt> themselves return a new

169

<tt class="docutils literal"><span class="pre">frame_stack_top</span></tt> object: the freshly captured state of the caller of the

170

source <tt class="docutils literal"><span class="pre">switch()</span></tt> that was just executed, or None in the case described

171

below.</li>

172

<li>the function that called <tt class="docutils literal"><span class="pre">yield_current_frame_to_caller()</span></tt> also has a

173

normal return statement, like all functions. This statement must return

174

another <tt class="docutils literal"><span class="pre">frame_stack_top</span></tt> object. The latter is <em>not</em> returned to the

175

original caller; there is no way to return several times to the caller.

176

Instead, it designates the place to which the execution must jump, as if by

177

a <tt class="docutils literal"><span class="pre">switch()</span></tt>. The place to which we jump this way will see a None as the

178

source frame stack top.</li>

179

<li>every frame stack top must be resumed once and only once. Not resuming

180

it at all causes a leak. Resuming it several times causes a crash.</li>

181

<li>a function that called <tt class="docutils literal"><span class="pre">yield_current_frame_to_caller()</span></tt> should not raise.

182

It would have no implicit parent frame to propagate the exception to. That

183

would be a crashingly bad idea.</li>

184

</ul>

185

<p>The following example would print the numbers from 1 to 7 in order:</p>

186

187

def g():

188

print 2

189

frametop_before_5 = yield_current_frame_to_caller()

190

print 4

191

frametop_before_7 = frametop_before_5.switch()

192

print 6

193

return frametop_before_7

194

195

def f():

196

print 1

197

frametop_before_4 = g()

198

print 3

199

frametop_before_6 = frametop_before_4.switch()

200

print 5

201

frametop_after_return = frametop_before_6.switch()

202

print 7

203

assert frametop_after_return is None

204

205

f()

206

</pre>

207

</div>

208

209

<h1><a class="toc-backref" href="#id20" id="streamio" name="streamio"><tt class="docutils literal"><span class="pre">streamio</span></tt></a></h1>

210

<p>The <a class="reference" href="../../pypy/rlib/streamio.py">streamio</a> contains an RPython stream I/O implementation (which was started

211

by Guido van Rossum as <a class="reference" href="http://svn.python.org/view/sandbox/trunk/sio/sio.py">sio.py</a> in the CPython sandbox as a prototype for the

212

upcoming new file implementation in Python 3000).</p>

213

</div>

214

215

<h1><a class="toc-backref" href="#id21" id="unroll" name="unroll"><tt class="docutils literal"><span class="pre">unroll</span></tt></a></h1>

216

<p>The <a class="reference" href="../../pypy/rlib/unroll.py">unroll</a> module most importantly contains the function <tt class="docutils literal"><span class="pre">unrolling_iterable</span></tt>

217

which wraps an iterator. Looping over the iterator in RPython code will not

218

produce a loop in the resulting flow graph but will unroll the loop instead.</p>

219

</div>

220

221

<h1><a class="toc-backref" href="#id22" id="parsing" name="parsing"><tt class="docutils literal"><span class="pre">parsing</span></tt></a></h1>

222

<p>The <a class="reference" href="../../pypy/rlib/parsing/">parsing</a> module is a still in-development module to generate tokenizers and

223

parsers in RPython. It is still highly experimental and only really used by the

224

<a class="reference" href="../../pypy/lang/prolog/">Prolog interpreter</a> (although in slightly non-standard ways). The easiest way

225

to specify a tokenizer/grammar is to write it down using regular expressions and

226

simple EBNF format.</p>

227

<p>The regular expressions are implemented using finite automatons. The parsing

228

engine uses <a class="reference" href="http://pdos.csail.mit.edu/~baford/packrat/">packrat parsing</a>, which has O(n) parsing time but is more

229

powerful than LL(n) and LR(n) grammars.</p>

230

231

<h2><a class="toc-backref" href="#id23" id="regular-expressions" name="regular-expressions">Regular Expressions</a></h2>

232

<p>The regular expression syntax is mostly a subset of the syntax of the <a class="reference" href="http://docs.python.org/lib/module-re.html">re</a>

233

module. By default, non-special characters match themselves. If you concatenate

234

regular expressions the result will match the concatenation of strings matched

235

by the single regular expressions.</p>

236

237

238

<dd><tt class="docutils literal"><span class="pre">R|S</span></tt> matches any string that <em>either</em> matches R or matches S.</dd>

239

240

<dd><tt class="docutils literal"><span class="pre">R*</span></tt> matches 0 or more repetitions of R.</dd>

241

242

<dd><tt class="docutils literal"><span class="pre">R+</span></tt> matches 1 or more repetitions of R.</dd>

243

244

<dd><tt class="docutils literal"><span class="pre">R?</span></tt> matches 0 or 1 repetition of R.</dd>

245

246

<dd>Parenthesis can be used to group regular expressions (note that in constrast

247

to Python's re module you cannot later match the content of this group).</dd>

248

249

<dd><tt class="docutils literal"><span class="pre">R{m}</span></tt> matches exactly m repetitions of R.</dd>

250

251

<dd><tt class="docutils literal"><span class="pre">R{m,</span> <span class="pre">n}</span></tt> matches between m and n repetitions of R (including m and n).</dd>

252

253

<dd>Matches a set of characters. The characters to be matched can be listed

254

sequentially. A range of characters can be specified using <tt class="docutils literal"><span class="pre">-</span></tt>. For

255

examples <tt class="docutils literal"><span class="pre">[ac-eg]</span></tt> matches the characters a, c, d, e and g.

256

The whole set can be inverted by starting it with <tt class="docutils literal"><span class="pre">^</span></tt>. So [^a] matches

257

anything except a.</dd>

258

</dl>

259

<p>To parse a regular expression and to get a matcher for it, you can use the

260

function <tt class="docutils literal"><span class="pre">make_runner(s)</span></tt> in the <tt class="docutils literal"><span class="pre">pypy.rlib.parsing.parseregex</span></tt> module. It

261

returns a object with a <tt class="docutils literal"><span class="pre">recognize(input)</span></tt> method that returns True or False

262

depending on whether <tt class="docutils literal"><span class="pre">input</span></tt> matches the string or not.</p>

263

</div>

264

265

266

<p>To describe a tokenizer and a grammar the <tt class="docutils literal"><span class="pre">pypy.rlib.parsing.ebnfparse</span></tt>

267

defines a syntax for doing that.</p>

268

<p>The syntax file contains a sequence or rules. Every rule either describes a

269

regular expression or a grammar rule.</p>

270

<p>Regular expressions rules have the form:</p>

271

272

NAME: "regex";

273

</pre>

274

<p>NAME is the name of the token that the regular expression

275

produces (it has to consist of upper-case letters), <tt class="docutils literal"><span class="pre">regex</span></tt> is a regular

276

expression with the syntax described above. One token name is special-cased: a

277

token called <tt class="docutils literal"><span class="pre">IGNORE</span></tt> will be filtered out of the token stream before being

278

passed on to the parser and can thus be used to match comments or

279

non-significant whitespace.</p>

280

<p>Grammar rules have the form:</p>

281

282

name: expansion_1 | expansion_2 | ... | expansion_n;

283

</pre>

284

<p>Where <tt class="docutils literal"><span class="pre">expansion_i</span></tt> is a sequence of nonterminal or token names:</p>

285

286

symbol_1 symbol_2 symbol_3 ... symbol_n

287

</pre>

288

<p>This means that the nonterminal symbol <tt class="docutils literal"><span class="pre">name</span></tt> (which has to consist of

289

lower-case letters) can be expanded into any of the expansions. The expansions

290

can consist of a sequence of token names, nonterminal names or literals, which

291

are strings in quotes that are matched literally.</p>

292

<p>An example to make this clearer:</p>

293

294

IGNORE: " ";

295

DECIMAL: "0|[1-9][0-9]*";

296

additive: multitive "+" additive |

297

multitive;

298

multitive: primary "*" multitive |

299

primary;

300

primary: "(" additive ")" | DECIMAL;

301

</pre>

302

<p>This grammar describes the syntax of arithmetic impressions involving addition

303

and multiplication. The tokenizer

304

produces a stream of either DECIMAL tokens or tokens that have matched one of

305

the literals "+", "*", "(" or ")". Any space will be ignored. The grammar

306

produces a syntax tree that follows the precedence of the operators. For example

307

the expression <tt class="docutils literal"><span class="pre">12</span> <span class="pre">+</span> <span class="pre">4</span> <span class="pre">*</span> <span class="pre">5</span></tt> is parsed into the following tree:</p>

308

309

</div>

310

311

<h2><a class="toc-backref" href="#id25" id="parse-trees" name="parse-trees">Parse Trees</a></h2>

312

<p>The parsing process builds up a tree consisting of intances of <tt class="docutils literal"><span class="pre">Symbol</span></tt> and

313

<tt class="docutils literal"><span class="pre">Nonterminal</span></tt>, the former corresponding to tokens, the latter to nonterminal

314

symbols. Both classes live in the <a class="reference" href="../../pypy/rlib/parsing/tree.py">pypy.rlib.parsing.tree</a> module. You can use

315

the <tt class="docutils literal"><span class="pre">view()</span></tt> method <tt class="docutils literal"><span class="pre">Nonterminal</span></tt> instances to get a pygame view of the

316

parse tree.</p>

317

<p><tt class="docutils literal"><span class="pre">Symbol</span></tt> instances have the following attributes: <tt class="docutils literal"><span class="pre">symbol</span></tt>, which is the

318

name of the token and <tt class="docutils literal"><span class="pre">additional_info</span></tt> which is the matched source.</p>

319

<p><tt class="docutils literal"><span class="pre">Nonterminal</span></tt> instances have the following attributes: <tt class="docutils literal"><span class="pre">symbol</span></tt> is the name

320

of the nonterminal and <tt class="docutils literal"><span class="pre">children</span></tt> which is a list of the children attributes.</p>

321

322

<h3><a class="toc-backref" href="#id26" id="visitors" name="visitors">Visitors</a></h3>

323

<p>To write tree visitors for the parse trees that are RPython, there is a special

324

baseclass <tt class="docutils literal"><span class="pre">RPythonVisitor</span></tt> in <tt class="docutils literal"><span class="pre">pypy.rlib.parsing.tree``_</span> <span class="pre">to</span> <span class="pre">use.</span> <span class="pre">If</span> <span class="pre">your</span>

325

<span class="pre">class</span> <span class="pre">uses</span> <span class="pre">this,</span> <span class="pre">it</span> <span class="pre">will</span> <span class="pre">grow</span> <span class="pre">a</span> <span class="pre">``dispatch(node)</span></tt> method, that calls an

326

appropriate <tt class="docutils literal"><span class="pre">visit_<symbol></span></tt> method, depending on the <tt class="docutils literal"><span class="pre">node</span></tt> argument. Here

327

the <symbol> is replaced by the <tt class="docutils literal"><span class="pre">symbol</span></tt> attribute of the visited node.</p>

328

<p>For the visitor to be RPython, the return values of all the visit methods need

329

to be of the same type.</p>

330

</div>

331

</div>

332

333

<h2><a class="toc-backref" href="#id27" id="tree-transformations" name="tree-transformations">Tree Transformations</a></h2>

334

<p>As the tree of arithmetic example above shows, by default the parse tree

335

contains a lot of nodes that are not really conveying useful information.

336

To get rid of some of them, there is some support in the grammar format to

337

automatically create a visitor that transforms the tree to remove the additional

338

nodes. The simplest such transformation just removes nodes, but there are

339

more complex ones.</p>

340

<p>The syntax for these transformations is to enclose symbols in expansions of a

341

nonterminal by [...], <...> or >...<.</p>

342

343

<h3><a class="toc-backref" href="#id28" id="symbol-1-symbol-2-symbol-n" name="symbol-1-symbol-2-symbol-n">[symbol_1 symbol_2 ... symbol_n]</a></h3>

344

<p>This will produce a transformer that completely removes the enclosed symbols

345

from the tree.</p>

346

<p>Example:</p>

347

348

IGNORE: " ";

349

n: "A" [","] n | "A";

350

</pre>

351

<p>Parsing the string "A, A, A" gives the tree:</p>

352

353

<p>After transformation the tree has the "," nodes removed:</p>

354

355

</div>

356

357

358

<p>This will replace the parent with symbol. Every expansion can contain at most

359

one symbol that is enclosed by <...>, because the parent can only be replaced

360

once, obviously.</p>

361

<p>Example:</p>

362

363

IGNORE: " ";

364

n: "a" "b" "c" m;

365

m: "(" <n> ")" | "d";

366

</pre>

367

<p>Parsing the string "a b c (a b c d)" gives the tree:</p>

368

369

<p>After transformation the tree looks like this:</p>

370

371

</div>

372

373

<h3><a class="toc-backref" href="#id30" id="nonterminal-1-nonterminal-2-nonterminal-n" name="nonterminal-1-nonterminal-2-nonterminal-n">>nonterminal_1 nonterminal_2 ... nonterminal_n<</a></h3>

374

<p>This replaces the nodes nonterminal_1 to nonterminal_n by their children.</p>

375

<p>Example:</p>

376

377

IGNORE: " ";

378

DECIMAL: "0|[1-9][0-9]*";

379

list: DECIMAL >list< | DECIMAL;

380

</pre>

381

<p>Parsing the string "1 2" gives the tree:</p>

382

383

<p>after the transformation the tree looks like:</p>

384

385

<p>Note that the transformation works recursively. That means that the following

386

also works: if the string "1 2 3 4 5" is parsed the tree at first looks like

387

this:</p>

388

389

<p>But after transformation the whole thing collapses to one node with a lot of

390

children:</p>

391

392

</div>

393

</div>

394

395

<h2><a class="toc-backref" href="#id31" id="extensions-to-the-ebnf-grammar-format" name="extensions-to-the-ebnf-grammar-format">Extensions to the EBNF grammar format</a></h2>

396

<p>There are some extensions to the EBNF grammar format that are really only

397

syntactic sugar but make writing grammars less tedious. These are:</p>

398

399

<dt><tt class="docutils literal"><span class="pre">symbol?</span></tt>:</dt>

400

<dd>matches 0 or 1 repetitions of symbol</dd>

401

<dt><tt class="docutils literal"><span class="pre">symbol*</span></tt>:</dt>

402

<dd>matches 0 or more repetitions of symbol. After the tree transformation all

403

these repetitions are children of the current symbol.</dd>

404

<dt><tt class="docutils literal"><span class="pre">symbol+</span></tt>:</dt>

405

<dd>matches 1 or more repetitions of symbol. After the tree transformation all

406

these repetitions are children of the current symbol.</dd>

407

</dl>

408

<p>These are implemented by adding some more rules to the grammar in the correct

409

way. Examples: the grammar:</p>

410

411

s: a b? c;

412

</pre>

413

<p>is transformed to look like this:</p>

414

415

s: a >_maybe_symbol_0_< c | a c;

416

_maybe_symbol_0_: b;

417

</pre>

418

<p>The grammar:</p>

419

420

s: a b* c;

421

</pre>

422

<p>is transformed to look like this:</p>

423

424

s: a >_star_symbol_0< c | a c;

425

_star_symbol_0: b >_symbol_star_0< | b;

426

</pre>

427

<p>The grammar:</p>

428

429

s: a b+ c;

430

</pre>

431

<p>is transformed to look like this:</p>

432

433

s: a >_plus_symbol_0< c;

434

_plus_symbol_0: b >_plus_symbol_0< | b;

435

</pre>

436

</div>

437

438

<h2><a class="toc-backref" href="#id32" id="full-example" name="full-example">Full Example</a></h2>

439

<p>A semi-complete parser for the <a class="reference" href="http://www.json.org">json format</a>:</p>

440

441

STRING: "\\"[^\\\\"]*\\"";

442

NUMBER: "\-?(0|[1-9][0-9]*)(\.[0-9]+)?([eE][\+\-]?[0-9]+)?";

443

IGNORE: " |\n";

444

445

<"true"> | <"false">;

446

object: ["{"] (entry [","])* entry ["}"];

447

array: ["["] (value [","])* value ["]"];

448

entry: STRING [":"] value;

449

</pre>

450

<p>The resulting tree for parsing the string:</p>

451

452

{"a": "5", "b": [1, null, 3, true, {"f": "g", "h": 6}]}

453

</pre>

454

<p>looks like this:</p>

455

456

</div>

457

</div>

458

</div>

459

</div></body></html>

b'\\ No newline at end of file'

Older »