3
=head1 Overview of the Rakudo Perl 6 compiler
5
This document describes the architecture and layout of the
6
Rakudo Perl 6 (a.k.a. Rakudo) compiler. See the F<README>
7
file for information about how to build and run the compiler.
9
The Rakudo compiler is constructed from four major components:
3
=head1 RAKUDO COMPILER OVERVIEW
5
=head2 How the Rakudo Perl 6 compiler works
7
This document describes the architecture and operation of the Rakudo
8
Perl 6 (or simply Rakudo) compiler. The F<README> describes how to
11
Rakudo has six main parts summarized below. Source code paths are
12
relative to Rakudo's F<src/> directory, and platform specific filename
13
extensions such as F<.exe> are sometimes omitted for brevity.
15
The main compiler object (perl6.pir)
19
Not Quite Perl builds Perl 6 source code parts into Rakudo
19
The parse grammar (src/parser/grammar.pg, src/parser/*.pir)
23
A main program drives parsing, code generation and runtime execution
24
(F<Perl6/Compiler.pir>)
23
A set of action methods to transform the parse tree into an abstract syntax
24
tree (src/parser/actions.pm)
28
A grammar parses user programs (F<Perl6/Grammar.pm>)
28
Builtin functions and runtime support (src/setting/, src/builtins/,
29
src/classes/, src/pmc/)
32
Action methods build a Parrot Abstract Syntax Tree (F<Perl6/Actions.pm>)
36
Parrot extensions provide Perl 6 run time behavior (F<ops/perl6.ops>,
37
F<pmc/*.pmc>, F<binder/*>)
41
Libraries provide functions at run time (F<builtins/*.pir>, F<cheats/*>,
42
F<core/*.pm>, F<glue/*.pir>, F<metamodel/*>)
33
The F<Makefile> takes care of compiling all of the individual
34
components into compiled form and linking them together to
35
form the F<perl6.pbc> executable.
40
The Perl 6 compiler object itself, in F<perl6.pir>, drives the parsing and
41
action methods. The compiler is an instance of C<PCT::HLLCompiler>, which
42
provides a standard framework for parsing, optimization, and command line
43
argument handling for Parrot compilers. The C<onload> subroutine in
44
F<perl6.pir> simply creates a new C<PCT::HLLCompiler> object, registers it as
45
the C<Perl6> compiler, and sets it to use the C<Perl6::Grammar> and
46
C<Perl6::Grammar::Actions> classes defined above.
48
The C<main> subroutine in perl6.pir is used when Rakudo is invoked
49
from the command line -- it simply passes control to the C<Perl6>
50
compiler object registered by the C<onload> subroutine.
52
Lastly, the C<perl6.pir> source uses PIR C<.include> directives
53
to pull in the PIR sources for the parse grammar, action methods,
54
and runtime builtin functions.
59
The parse grammar is written using a mix of Perl 6 regular
60
expressions, operator tokens, and special-purpose PIR
61
subroutines. The primary purpose of the parse grammar is
62
to parse Perl 6 source code into a parse tree.
64
Currently the parse grammar is spread across three files:
66
src/parser/grammar.pg - the top-level grammar
67
src/parser/grammer-oper.pg - operator tokens
68
src/parser/quote_expression.pir - quote rule
70
The top-level portion of the grammar is written using Perl 6
71
rules (Synopsis 5) and is based on the STD.pm grammar in the
72
Pugs repository (L<http://svn.pugscode.org/pugs/src/perl6/STD.pm>).
73
There are a few places where this grammar deviates from STD.pm,
74
but the ultimate goal is for the two to converge. The grammar
75
inherits from C<PCT::Grammar>, which provides the C<< <.panic> >>
76
rule to throw exceptions for syntax errors.
78
The parse grammar is compiled into PIR (F<src/gen_grammar.pir>)
79
using the Perl6Grammar compiler that is part of PGE and the Parrot
80
Compiler Toolkit. Because PGE doesn't yet implement the
81
proto-regex or longest token matching semantics of S05, we
82
make use of PGE's built-in operator precedence parser and define
83
operator tokens in grammar-oper.pg .
85
Lastly, the F<src/parser/quote_expression.pir> file implements
86
code to parse the various forms of Perl 6 quoting rules. It's
87
far easier to write this component using PIR instead of a
88
regular expression, but otherwise it acts just like any other
94
The action methods (in F<src/parser/actions.pm>) are used to convert the nodes
95
of the parse tree (produced by the parse grammar) into an equivalent Parrot
96
Abstract Syntax Tree (PAST) representation, which is then passed on to Parrot.
98
The action methods are where the Rakudo compiler does the bulk of the work of
99
creating an executable program. Action methods are written in Perl 6, but we
100
use NQP to compile them into PIR as F<src/gen_actions.pir>.
102
When Rakudo is compiling a Perl 6 program, action methods are invoked
103
by the C< {*} > symbols in the parse grammar. Each C< {*} > in a rule
104
causes the action method corresponding to the rule's name to be
105
invoked, passing the current match object as an argument. If the
106
rule source line containing C< {*} > also contains a comment
107
starting with C< #= >, any text after the comment is passed as a
108
separate key argument to the action method. (This is similar to
109
the approach that STD.pm uses to mark and distinguish actions.)
46
The F<Makefile> (generated from F<build/Makefile.in> by
47
F<../Configure.pl>) compiles all the parts to form the F<perl6.pbc>
48
executable and the F<perl6> or F<perl6.exe> "fake executable". We call
49
it fake because it has only a small stub of code to start the Parrot
50
virtual machine, and passes itself as a chunk of bytecode for Parrot to
51
execute. The source code of the "fakecutable" is generated as
52
F<perl6.c> with the stub at the very end. The entire contents of
53
F<perl6.pbc> are represented as escaped octal characters in one huge
54
string called C<program_code>. What a hack!
58
The source files of Rakudo are preferably and increasingly written in
59
Perl 6, the remainder in Parrot Intermediate Representation (PIR) or C.
60
Not Quite Perl (nqp) provides the bootstrap step of compiling compiler
61
code (yes!) written in a subset of Perl 6, into PIR.
63
The latest version of NQP is called B<nqp-rx> because it now also
64
includes a powerful Perl 6 regex engine. This gives a streamlined
65
compiler framework on which to build a very functional Perl 6
68
NQP-rx is a bootstrapped compiler, it is mostly written in NQP-rx.
69
The source code of NQP-rx is in a separate repository at
70
L<http://github.com/perl6/nqp-rx/>. A compiled version of NQP-rx is shipped
71
with parrot in the F<../parrot/ext/nqp-rx/> directory, and the resulting
72
compiler is F<../parrot_install/bin/parrot-nqp>. Note, NQP-rx only
73
I<builds> the Rakudo compiler, and does not compile or run user programs.
77
NQP[-RX] compiles us a very good compiler in F<gen/perl6.pbc>, referred
78
to as "stage-1", or C<S1_PERL6_PBC> in the F<Makefile>. This version
79
would be limited in production though, because libraries of classes and
80
methods available at run time (for example Complex) have not yet been
83
The "stage-1" compiler (note: not NQP) compiles all Rakudo's Perl 6 code
84
again, this time including all the library modules (F<gen/core.pm>), to
85
make F<perl6.pbc> which could be called "stage-2" (note: not in
86
F<gen/>). That F<gen/core.pm> file is generated by
87
F<build/gen_core_pm.pl> from a list called C<CORE_SOURCES> in
88
F<Makefile>. Thanks to the staging process, a large and growing
89
proportion of Rakudo's source code is written in Perl 6.
91
We can conceivably use the Rakudo compiler to compile itself to PIR and
92
eliminate the need for NQP entirely. At some point as Rakudo matures we
93
will probably do this. However, for the time being it's slightly easier
94
to manage the process if we keep a distinction between the two tools,
95
and using NQP for this stage also helps us to limit ourselves to using a
96
regular, well-defined, and relatively easy-to-implement subset of Perl 6
97
for the core compiler. So, while it's possible for us to eliminate NQP
98
from the process, there are some good reasons not to do so just yet.
99
(If at some point we discover that we need something for the compiler
100
that NQP can't or won't support, then that will probably be a good point
103
=head2 2. Compiler main program
105
A subroutine called C<'main'>, in F<Perl6/Compiler.pir>, starts the
106
source parsing and bytecode generation work. It creates a
107
C<Perl6::Compiler> object for the C<'perl6'> source type. The
108
C<Perl6::Compiler> class inherits from the Parrot Compiler Toolkit's
109
C<HLLCompiler> class, see
110
F<../parrot/compilers/pct/src/PCT/HLLCompiler.pir>.
112
Before tracing Rakudo's execution further, a few words about Parrot
113
process and library initialization.
115
Parrot execution does not simply begin with 'main'. When Parrot
116
executes a bytecode file, it first calls all subroutines in it that are
117
marked with the C<:init> modifier. Rakudo has over 50 such subroutines,
118
brought in by C<.include> directives in F<Perl6/Compiler.pir>, to create
119
classes and objects in Parrot's memory.
121
Similarly, when the executable loads libraries, Parrot automatically
122
calls subs having the C<:load> modifier. The Rakudo C<:init> subs are
123
usually also C<:load>, so that the same startup sequence occurs whether
124
Rakudo is run as an executable or loaded as a library.
126
So, that Rakudo 'main' subroutine had created a C<Perl6::Compiler>
127
object. Next, 'main' invokes the C<'command_line'> method on this
128
object, passing the command line arguments in a PMC called C<args_str>.
129
The C<'command_line'> method is inherited from the C<HLLCompiler> parent
130
class (part of the PCT, remember).
132
And that's it, apart from a C<'!fire_phasers'('END')> and an C<exit>.
133
Well, as far a C<'main'> is concerned. The remaining work is divided
134
between PCT, grammar and actions.
138
Using C<parrot-nqp>, C<make> target C<PERL6_G> uses F<parrot-nqp> to
139
compile F<Perl6/Grammar.pm> to F<gen/perl6-grammar.pir>.
141
The compiler works by calling C<TOP> method in F<Perl6/Grammar.pm>.
142
After some initialization, TOP matches the user program to the comp_unit
143
(meaning compilation unit) token. That triggers a series of matches to
144
other tokens and rules (two kinds of regex) depending on the source in
111
147
For example, here's the parse rule for Rakudo's C<unless> statement
112
(in src/parser/grammar.pg):
114
rule unless_statement {
115
$<sym>=[unless] <EXPR> <block>
119
This rule says that an unless statement consists of the word "unless"
120
(captured into C<< $<sym> >>), followed by an expression and then a block.
121
If all of those match successfully, then the C< {*} > invokes the
122
corresponding action method for unless_statement. Here's the action
123
method for the unless statement (from src/parser/actions.pm):
125
method unless_statement($/) {
126
my $then := $( $<block> );
127
$then.blocktype('immediate');
128
my $past := PAST::Op.new( $( $<EXPR> ), $then,
135
When this action method is invoked from the unless_statement rule,
136
the current match object containing the parsed statement is passed
137
into the method as C< $/ >. In Perl 6, this means that the
138
expressions C<< $<EXPR> >> and C<< $<block> >> will refer to
139
whatever was matched by the C<< <EXPR> >> and C<< <block> >>
140
subrules of the C<unless_statement> rule. ( C<< $<block> >>
141
is Perl 6 syntactic sugar for C< $/{'block'} >.)
143
Now then, the purpose of the action methods in our compiler is
144
to convert the parsed elements of the source program into their
145
abstract syntax tree (PAST) equivalents. The magic for this
146
occurs in the C< $(...) > and C<make> expressions in the method
147
body. The C< $(...) > operator is used to retrieve the PAST
148
representation of a parsed subtree. Thus, the first two statements
149
of C<unless_statement> retrieve the PAST representation of the
150
C<< <block> >> subtree into C<$then>, and set that block to
151
be an immediately executed block.
153
The third statement creates a new C<PAST::Op> node for the
154
unless statement, using the PAST representation of C<< <EXPR> >>
155
as the condition to be tested, the C<$then> block as the body,
156
and C<:pasttype('unless')> as the type of operation to be
157
performed. The C<:node($/)> argument is used to link this
158
PAST node back to the source code that generated it (e.g., for
161
Finally, the C<make> statement at the end of the method sets
162
the newly created PAST::Op node as the PAST representation of
163
the unless statement that was just parsed.
165
The Parrot Compiler Toolkit provides a wide variety of PAST
166
node types for representing the various components of a HLL
167
program -- for more details about the available node types,
168
see PDD 26 (L<http://svn.parrot.org/parrot/trunk/docs/pdds/pdd26_ast.pod>).
170
One important observation to make here is that NQP is used only for
171
I<building> the Rakudo compiler, and then only to convert the action methods
172
in F<src/parser/actions.pm> into equivalent PIR (F<src/gen_actions.pir>).
173
The F<src/gen_actions.pir> file is then used to build F<perl6.pbc>.
174
In particular, NQP is I<not> part of the Rakudo runtime -- i.e., when
175
Rakudo is running, NQP is not loaded or used. Yes, this does mean that
176
we can conceivably use the Rakudo compiler to compile F<actions.pm> to
177
PIR and eliminate the need for NQP entirely. At some point as Rakudo
178
matures we will probably do this. However, for the time being it's
179
slightly easier to manage the process if we keep a distinction between
180
the two tools, and using NQP for this stage also helps us to limit
181
ourselves to using a regular, well-defined, and relatively
182
easy-to-implement subset of Perl 6 for the core compiler.
183
So, while it's possible for us to eliminate NQP from the process,
184
there are some good reasons not to do so just yet. (If at some
185
point we discover that we need something for the compiler that
186
NQP can't or won't support, then that will probably be a good
190
=head2 How a program is executed by the compiler
192
This is a rough outline of how Rakudo executes a program.
198
The main compiler object (perl6.pir) looks at any parameters and slurps in your program.
202
The program passes through the parser (as defined in the parse grammar
203
(src/parser/grammar.pg, src/parser/*.pir). This outputs the parse tree.
207
Action methods transform the parse tree into a Parrot Abstract Syntax
212
The PAST is provided to Parrot, which does its thing.
216
The PAST includes references to builtin functions and runtime support. These
217
are also provided to Parrot.
221
The PAST representation is the
222
final stage of processing in Rakudo itself. The PAST datastructure is then
223
passed on to Parrot directly. Parrot does the remainder of the work translating
224
from PAST to pir and then to bytecode.
227
=head2 Builtin functions and runtime support
229
The last component of the compiler are the various builtin
230
functions and libraries that a Perl 6 program expects to
231
have available when it is running. These include functions
232
for the basic operations (C<< infix:<+> >>, C<< prefix:<abs> >>)
233
as well as common global functions such as C<say> and C<print>.
235
Currently, most of the builtins are written in PIR, either because
236
it's simpler to write them that way or because they represent
237
very primitive operations (e.g., math primitives) or they're
238
easier to write in PIR than in Perl 6 or some other language.
240
In the very near future we expect to be writing much of the
241
additional runtime as Perl 6 code instead of PIR. In other
242
words, we'll build just enough runtime to get a basic Rakudo
243
compiler running, and then use that to compile the remainder
244
of the runtime libraries (written in Perl 6) that a standard
245
Perl 6 program would expect to have available when it is run.
148
(in F<Perl6/Grammar.pm>):
150
token statement_control:sym<unless> {
153
[ <!before 'else'> ||
154
<.panic: 'unless does not take "else", please rewrite using "if"'>
158
This token says that an C<unless> statement consists of the word
159
"unless" (captured into C<< $<sym> >>), and then an expression followed
160
by a block. The C<.panic:> is a typical "Awesome" error message and the
161
syntax is almost exactly the same as in F<STD.pm>, described below.
163
Remember that for a match, not only must the C<< <sym> >> match the word
164
C<unless>, the C<< <xblock> >> must also match the C<xblock> token. If
165
you read more of F<Perl6/Grammar.pm>, you will learn that C<xblock> in
166
turn tries to match an C<< <EXPR> >> and a C<< <pblock> >>, which in
167
turn tries to match .....
169
That is why this parsing algorithm is called Recursive Descent.
171
The top-level portion of the grammar is written using Perl 6 rules
172
(Synopsis 5) and is based on the STD.pm grammar in the C<perl6/std>
173
repository (L<https://github.com/perl6/std/>). There are a few
174
places where Rakudo's grammar deviates from STD.pm, but the ultimate
175
goal is for the two to converge. Rakudo's grammar inherits from PCT's
176
C<HLL::Grammar>, which provides the C<< <.panic> >> rule to throw
177
exceptions for syntax errors.
181
The F<Perl6/Actions.pm> file defines the code that the compiler
182
generates when it matches each token or rule. The output is a tree
183
hierarchy of objects representing language syntax elements, such as a
184
statement. The tree is called a Parrot Abstract Syntax Tree (PAST).
186
The C<Perl6::Actions> class inherits from C<HLL::Actions>, another part
187
of the Parrot Compiler Toolkit. Look in
188
F<../parrot/ext/nqp-rx/stage0/src/HLL-s0.pir> for several instances of
189
C<.namespace ["HLL";"Actions"]>.
191
When the PCT calls the C<'parse'> method on a grammar, it passes not
192
only the program source code, but also a pointer to a parseactions class
193
such as our compiled C<Perl6::Actions>. Then, each time the parser
194
matches a named regex in the grammar, it automatically invokes the same
195
named method in the actions class.
197
Back to the C<unless> example, here's the action method for the
198
C<unless> statement (from F<Perl6/Actions.pm>):
200
method statement_control:sym<unless>($/) {
201
my $past := xblock_immediate( $<xblock>.ast );
202
$past.pasttype('unless');
206
When the parser invokes this action method, the current match object
207
containing the parsed statement is passed into the method as C<$/>.
208
In Perl 6, this means that the expression C<< $<xblock> >> refers to
209
whatever the parser matched to the C<xblock> token. Similarly there
210
are C<< $<EXPR> >> and C<< $<pblock> >> objects etc until the end of the
211
recursive descent. By the way, C<< $<xblock> >> is Perl 6 syntactic
212
sugar for C< $/{'xblock'} >.
214
The magic occurs in the C<< $<xblock>.ast >> and C<make> expressions in
215
the method body. The C<.ast> method retrieves the PAST made already for
216
the C<xblock> subtree. Thus C<$past> becomes a node object describing
217
code to conditionally execute the block in the subtree.
219
The C<make> statement at the end of the method sets the newly created
220
C<xblock_immediate> node as the PAST representation of the unless
221
statement that was just parsed.
223
The Parrot Compiler Toolkit provides a wide variety of PAST node types
224
for representing the various components of a HLL program -- for more
225
details about the available node types, see PDD 26
227
L<http://docs.parrot.org/parrot/latest/html/docs/pdds/pdd26_ast.pod.html>
229
The PAST representation is the final stage of processing in Rakudo
230
itself, and is given to Parrot directly. Parrot does the remainder of
231
the work translating from PAST to PIR and then to bytecode.
233
=head2 5. Parrot extensions
235
Rakudo extends the Parrot virtual machine dynamically (i.e. at run
236
time), adding 14 dynamic opcodes ("dynops") which are additional virtual
237
machine code instructions, and 9 dynamic PMCs ("dynpmcs") (PolyMorphic
238
Container, remember?) which are are Parrot's equivalent of class
241
The dynops source is in F<ops/perl6.ops>, which looks like C, apart from
242
some Perlish syntactic sugar.
243
A F<../parrot_install/bin/ops2c> desugars
244
that to F<build/perl6.c> which your C compiler turns into a library.
246
For this overview, the opcode names and parameters might give a vague
247
idea what they're about:
250
rebless_subclass(in PMC, in PMC)
251
find_lex_skip_current(out PMC, in STR)
252
x_is_uprop(out INT, in STR, in STR, in INT)
253
get_next_candidate_info(out PMC, out PMC, out PMC)
254
transform_to_p6opaque(inout PMC)
255
deobjectref(out PMC, in PMC)
256
descalarref(out PMC, in PMC)
257
allocate_signature(out PMC, in INT)
258
get_signature_size(out INT, in PMC)
259
set_signature_elem(in PMC, in INT, in STR, in INT, inout PMC,
260
inout PMC, inout PMC, inout PMC, inout PMC, inout PMC, in STR)
261
get_signature_elem(in PMC, in INT, out STR, out INT, out PMC, out PMC,
262
out PMC, out PMC, out PMC, out PMC, out STR)
263
bind_signature(in PMC)
264
x_setprophash(in PMC, in PMC)
266
The dynamic PMCs are in F<pmc/*.pmc>, one file per class. The language
267
is again almost C, but with other sugary differences this time, for
268
example definitions like C<group perl6_group> whose purpose will appear
270
A F<../parrot_install/lib/x.y.z-devel/tools/build/pmc2c.pl> converts the
271
sugar to something your C compiler understands.
273
For a rough idea what these classes are for, here are the names:
274
P6Invocation P6LowLevelSig MutableVAR Perl6Scalar ObjectRef P6role
275
Perl6MultiSub Perl6Str and P6Opaque.
279
The dynops and the dynpmcs call a utility routine called a signature
280
binder, via a function pointer called C<bind_signature_func>. A binder
281
matches parameters passed by callers of subs, methods and other code
282
blocks, to the lexical names used internally. Parrot has a flexible set
283
of calling conventions, but the Perl 6 permutations of arity, multiple
284
dispatch, positional and named parameters, with constraints, defaults,
285
flattening and slurping needs a higher level of operation. The answer
286
lies in F<binder/bind.c> which is compiled into C<perl6_ops> and
287
C<perl6_group> libraries. Read
288
L<http://use.perl.org/~JonathanWorthington/journal/39772> for a more
289
detailed explanation of the binder.
291
F<Perl6/Compiler.pir> has three C<.loadlib> commands early on. The
292
C<perl6_group> loads the 9 PMCs, the C<perl6_ops> does the 14 dynops,
293
and the C<math_ops> adds over 30 mathematical operators such as C<add>,
294
C<sub>, C<mul>, C<div>, C<sin>, C<cos>, C<sqrt>, C<log10> etc. (source
295
in F<parrot/src/ops/math.ops>)
297
=head2 6. Builtin functions and runtime support
299
The last component of the compiler are the various builtin functions and
300
libraries that a Perl 6 program expects to have available when it is
301
running. These include functions for the basic operations
302
(C<< infix:<+> >>, C<< prefix:<abs> >>) as well as common global
303
functions such as C<say> and C<print>.
305
The stage-1 compiler compiles these all and they become part of the
306
final F<perl6.pbc>. The source code is in F<builtins/*.pir>,
307
F<cheats/*>, F<core/*.pm>, F<glue/*.pir> and F<metamodel/*>.
248
309
=head2 Still to be documented