1
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2 Final//EN">
5
<link HREF="mailto:drh@microsoft.com" REV="made" TITLE="David R. Hanson">
6
<title>The lcc 4.1 Code-Generation Interface</title>
11
<h1>The lcc 4.1 Code-Generation Interface</h1>
13
<p ALIGN="LEFT"><strong><a HREF="http://www.research.microsoft.com/~cwfraser/">Christopher
14
W. Fraser</a> and <a HREF="http://www.research.microsoft.com/~drh/">David R. Hanson</a>, <a
15
HREF="http://www.research.microsoft.com/">Microsoft Research</a></strong></p>
20
<li><a HREF="#intro">Introduction</a> </li>
21
<li><a HREF="#metrics">5.1 Type Metrics</a></li>
22
<li><a HREF="#symbols">5.3 Symbols</a> </li>
23
<li><a HREF="#operators">5.5 Dag Operators</a></li>
24
<li><a HREF="#flags">5.6 Interface Flags</a></li>
25
<li><a HREF="#definitions">5.8 Definitions</a></li>
26
<li><a HREF="#constants">5.9 Constants</a></li>
27
<li><a HREF="#upcalls">5.12 Upcalls</a></li>
30
<h2><a NAME="intro">Introduction</a></h2>
32
<p>Version 4.1 is the latest release of <a
33
HREF="http://www.cs.princeton.edu/software/lcc/">lcc</a>, the ANSI C compiler described in
34
our book <cite>A Retargetable C Compiler: Design and Implementation</cite>
35
(Addison-Wesley, 1995, ISBN 0-8053-1670-1). This document summarizes the differences
36
between the 4.1 code-generation interface and the 3.x interface described in Chap. 5 of <cite>A
37
Retargetable C Compiler</cite>.</p>
39
<p>Previous versions of lcc supported only three sizes of integers, two sizes of floats,
40
and insisted that pointers fit in unsigned integers (see Sec. 5.1 of <cite>A Retargetable
41
C Compiler</cite>). These assumptions simplified the compiler, and were suitable for
42
32-bit architectures. But on 64-bit architectures, such as the DEC ALPHA, it's natural to
43
have four sizes of integers and perhaps three sizes of floats, and on 16-bit
44
architectures, 32-bit pointers don't fit in unsigned integers. Also, the 3.x constaints
45
limited the use of lcc's back ends for other languages, such as Java.</p>
47
<p>Version 4.x removes all of these restrictions: It supports any number of sizes for
48
integers and floats, and the size of pointers need not be related to the size of any of
49
the integer types. The major changes in the code-generation interface are:
52
<li>The number of type suffixes has been reduced to 6.</li>
53
<li>Dag operators are composed of a generic operator, a type suffix, and a size.</li>
54
<li>Unsigned variants of several operators have been added.</li>
55
<li>Several interface functions have new signatures.</li>
58
<p>In addition, version 4.x is written in ANSI C and uses the standard I/O library and
59
other standard C functions.</p>
61
<p>The sections below parallel the subsections of Chap. 5 of <cite>A Retargetable C
62
Compiler</cite> and summarize the differences between the 3.x and 4.x code-generation
63
interface. Unaffected subsections are omitted. Page citations refer to pages in <cite>A
64
Retargetable C Compiler</cite>.</p>
66
<h2><a NAME="metrics">5.1 Type Metrics</a></h2>
68
<p>There are now 10 metrics in an interface record:</p>
70
<pre>Metrics charmetric;
74
Metrics longlongmetric;
77
Metrics longdoublemetric;
79
Metrics structmetric;</pre>
81
<p>Each of these specifies the size and alignment of the corresponding type. <code>ptrmetric</code>
82
describes all pointers.</p>
84
<h2><a NAME="symbols">5.3 Symbols</a></h2>
86
<p>The actual value of a constant is stored in the <code>u.c.v</code> field of a symbol,
87
which holds a <code>Value</code>:</p>
89
<pre>typedef union value {
97
<p>The value is stored in the appropriate field according to its type, which is given by
98
the symbol's <code>type</code> field.</p>
100
<h2><a NAME="operators">5.5 Dag Operators</a></h2>
102
<p>The <code>op</code> field a of <code>node</code> structure holds a dag operator, which
103
consists of a generic operator, a type suffix, and a size indicator. The type suffixes
115
#define sizeop(n) ((n)<<10)</pre>
117
<p>Given a generic operator <code>o</code>, a type suffix <code>t</code>, and a size <code>s</code>,
118
a type- and size-specific operator is formed by <code>o+t+sizeop(s)</code>. For example, <code>ADD+F+sizeop(4)</code>
119
forms the operator <code>ADDF4</code>, which denotes the sum of two 4-byte floats.
120
Similarly, <code>ADD+F+sizeop(8)</code> forms <code>ADDF8</code>, which denotes 8-byte
121
floating addition. In the 3.x code-generation interface, <code>ADDF</code> and <code>ADDD</code>
122
denoted these operations. There was no size indicator in the 3.x operators because the
123
type suffix supplied both a type and a size.</p>
125
<p>Table 5.1 lists each generic operator, its valid type suffixes, and the number of <code>kids</code>
126
and <code>syms</code> that it uses; multiple values for <code>kids</code> indicate
127
type-specific variants. The notations in the <strong>syms</strong> column give the number
128
of <code>syms</code> values and a one-letter code that suggests their uses: 1V indicates
129
that <code>syms[0]</code> points to a symbol for a variable, 1C indicates that <code>syms[0]</code>
130
is a constant, and 1L indicates that <code>syms[0]</code> is a label. For 1S, <code>syms[0]</code>
131
is a constant whose value is a size in bytes; 2S adds <code>syms[1]</code>, which is a
132
constant whose value is an alignment. For most operators, the type suffix and size
133
indicator denote the type and size of operation to perform and the type and size of the
136
<table WIDTH="100%" BORDER="0" CELLPADDING="0" CELLSPACING="0">
138
<td COLSPAN="6" ALIGN="CENTER"><strong>Table 5.1<img SRC="/~drh/resources/dot_clear.gif"
139
ALT="|" WIDTH="18" HEIGHT="1">Node Operators.</strong></td>
142
<td><strong>syms</strong></td>
143
<td><strong>kids</strong></td>
144
<td><strong>Operator</strong></td>
145
<td><strong>Type Suffixes</strong></td>
146
<td><strong>Sizes</strong></td>
147
<td><strong>Operation</strong></td>
152
<td><code>ADDRF</code></td>
153
<td><code>...P..</code></td>
155
<td>address of a parameter</td>
160
<td><code>ADDRG</code></td>
161
<td><code>...P..</code></td>
163
<td>address of a global</td>
168
<td><code>ADDRL</code></td>
169
<td><code>...P..</code></td>
171
<td>address of a local</td>
176
<td><code>CNST</code></td>
177
<td><code>FIUP..</code></td>
181
<tr ALIGN="LEFT" VALIGN="TOP">
182
<td><img SRC="/~drh/resources/dot_clear.gif" ALT="|" WIDTH="1" HEIGHT="12"></td>
192
<td><code>BCOM</code></td>
193
<td><code>.IU...</code></td>
195
<td>bitwise complement</td>
200
<td><code>CVF</code></td>
201
<td><code>FI....</code></td>
203
<td>convert from float</td>
208
<td><code>CVI</code></td>
209
<td><code>FIU...</code></td>
210
<td>fdx csilh csilhp</td>
211
<td>convert from signed integer</td>
216
<td><code>CVP</code></td>
217
<td><code>..U..</code></td>
219
<td>convert from pointer</td>
224
<td><code>CVU</code></td>
225
<td><code>.IUP..</code></td>
227
<td>convert from unsigned integer</td>
232
<td><code>INDIR</code></td>
233
<td><code>FIUP.B</code></td>
240
<td><code>NEG</code></td>
241
<td><code>FI....</code></td>
246
<td><img SRC="/~drh/resources/dot_clear.gif" ALT="|" WIDTH="1" HEIGHT="12"></td>
256
<td><code>ADD</code></td>
257
<td><code>FIUP..</code></td>
258
<td>fdx ilh ilhp p</td>
264
<td><code>BAND</code></td>
265
<td><code>.IU...</code></td>
272
<td><code>BOR</code></td>
273
<td><code>.IU...</code></td>
275
<td>bitwise inclusive OR</td>
280
<td><code>BXOR</code></td>
281
<td><code>.IU...</code></td>
283
<td>bitwise exclusive OR</td>
288
<td><code>DIV</code></td>
289
<td><code>FIU...</code></td>
296
<td><code>LSH</code></td>
297
<td><code>.IU...</code></td>
304
<td><code>MOD</code></td>
305
<td><code>.IU...</code></td>
312
<td><code>MUL</code></td>
313
<td><code>FIU...</code></td>
315
<td>multiplication</td>
320
<td><code>RSH</code></td>
321
<td><code>.IU...</code></td>
328
<td><code>SUB</code></td>
329
<td><code>FIUP..</code></td>
330
<td>fdx ilh ilhp p</td>
334
<td><img SRC="/~drh/resources/dot_clear.gif" ALT="|" WIDTH="1" HEIGHT="12"></td>
344
<td><code>ASGN</code></td>
345
<td><code>FIUP.B</code></td>
352
<td><code>EQ</code></td>
353
<td><code>FIU...</code></td>
354
<td>fdx ilh ilhp</td>
355
<td>jump if equal</td>
360
<td><code>GE</code></td>
361
<td><code>FIU...</code></td>
362
<td>fdx ilh ilhp</td>
363
<td>jump if greater than or equal</td>
368
<td><code>GT</code></td>
369
<td><code>FIU...</code></td>
370
<td>fdx ilh ilhp</td>
371
<td>jump if greater than</td>
376
<td><code>LE</code></td>
377
<td><code>FIU...</code></td>
378
<td>fdx ilh ilhp</td>
379
<td>jump if less than or equal</td>
384
<td><code>LT</code></td>
385
<td><code>FIU...</code></td>
386
<td>fdx ilh ilhp</td>
387
<td>jump if less than</td>
392
<td><code>NE</code></td>
393
<td><code>FIU...</code></td>
394
<td>fdx ilh ilhp</td>
395
<td>jump if not equal</td>
408
<td><code>ARG</code></td>
409
<td><code>FIUP.B</code></td>
416
<td><code>CALL</code></td>
417
<td><code>FIUPVB</code></td>
419
<td>function call</td>
424
<td><code>RET</code></td>
425
<td><code>FIUPV.</code></td>
427
<td>return from function</td>
430
<td><img SRC="/~drh/resources/dot_clear.gif" ALT="|" WIDTH="1" HEIGHT="12"></td>
440
<td><code>JUMP</code></td>
441
<td><code>....V.</code></td>
443
<td>unconditional jump</td>
448
<td><code>LABEL</code></td>
449
<td><code>....V.</code></td>
451
<td>label definition</td>
455
<p>The entries in the <strong>Sizes</strong> column indicate sizes of the operators that
456
back ends must implement. Letters denote the size of float (f), double (d), long double
457
(x), character (c), short integer (s), integer (i), long integer (l), "long
458
long" integer (h) , and pointer (p). These sizes are separated into sets for each
459
type suffix, except that a single set is used for both I and U when the set for I is
460
identical to the set for U.</p>
462
<p>The actual values for the size indicators, fdxcsilhp, depend on the target. A
463
specification like <code>ADDF</code>f denotes the operator <code>ADD+F+sizeop(</code>f<code>)</code>,
464
where "f" is replaced by a target-dependent value, e.g., <code>ADDF4</code> and <code>ADDF8</code>.
465
For example, back ends must implement the following <code>CVI</code> and <code>MUL</code>
469
<p><code>CVIF</code>f <code>CVIF</code>d <code>CVIF</code>x<br>
470
<code>CVII</code>c <code>CVII</code>s <code>CVII</code>i <code>CVII</code>l <code>CVII</code>h<br>
471
<code>CVIU</code>c <code>CVIU</code>s <code>CVIU</code>i <code>CVIU</code>l <code>CVIU</code>h
472
<code>CVIU</code>p<br>
474
<code>MULF</code>f <code>MULF</code>d <code>MULF</code>x<br>
475
<code>MULI</code>i <code>MULI</code>l <code>MULI</code>h<br>
476
<code>MULU</code>i <code>MULU</code>l <code>MULU</code>h</p>
479
<p>On most platforms, there are fewer than three sizes of floats and six sizes of
480
integers, and pointers are usually the same size as one of the integers. And lcc doesn't
481
support the "long long" type, so h is not currently used. So the set of
482
platform-specific operators is usually smaller than the list above suggests. For example,
483
the X86, SPARC, and MIPS back ends implement the following <code>CVI</code> and <code>MUL</code>
487
<p><code>CVIF</code>4 <code>CVIF</code>8<br>
488
<code>CVII</code>1 <code>CVII</code>2 <code>CVII</code>4<br>
489
<code>CVIU</code>1 <code>CVIU</code>2 <code>CVIU</code>4 <br>
491
<code>MULF</code>4 <code>MULF</code>8<br>
492
<code>MULI</code>4<br>
493
<code>MULU</code>4</p>
496
<p>The set of operators is thus target-dependent; for example, <code>ADDI8</code> appears
497
only if the target supports an 8-byte integer type. <a
498
HREF="ftp://ftp.cs.princeton.edu/pub/packages/lcc/contrib/ops.c"><code>ops.c</code></a> is
499
a program that, given a set of sizes, prints the required operators and their values,
503
<pre>% <em>ops c=1 s=2 i=4 l=4 h=4 f=4 d=8 x=8 p=4</em>
505
CVIF4=4225 CVIF8=8321
506
CVII1=1157 CVII2=2181 CVII4=4229
507
CVIU1=1158 CVIU2=2182 CVIU4=4230
509
MULF4=4561 MULF8=8657
516
<p>The type suffix for a conversion operator denotes the type of the result and the size
517
indicator gives the size of the result. For example, <code>CVUI4</code> converts an
518
unsigned (<code>U</code>) to a 4-byte signed integer (<code>I4</code>). The <code>syms[0]</code>
519
field points to a symbol-table entry for a integer constant that gives the size of the
520
source operand. For example, if <code>syms[0]</code> in a <code>CVUI4</code> points to a
521
symbol-table entry for 2, the conversion widens a 2-byte unsigned integer to a 4-byte
522
signed integer. Conversions that widen unsigned integers zero-extend; those that widen
523
signed integers sign-extend.</p>
525
<p>The front end composes conversions between types <em>T</em><sub>1</sub> and <em>T</em><sub>2</sub>
526
by widening <em>T</em><sub>1</sub> to it's "supertype", if necessary, converting
527
that result to <em>T</em><sub>2</sub>'s supertype, then narrowing the result to <em>T</em><sub>2</sub>,
528
if necessary. The following table lists the supertypes; omitted entries are their own
532
<table BORDER="0" CELLPADDING="0" CELLSPACING="0">
534
<td><strong>Type</strong></td>
535
<td><img SRC="/~drh/resources/dot_clear.gif" ALT="|" WIDTH="24" HEIGHT="1"></td>
536
<td><strong>Supertype</strong></td>
544
<td>signed short</td>
548
<tr ALIGN="LEFT" VALIGN="TOP">
549
<td>unsigned char</td>
551
<td>int, if sizeof (char) < sizeof (int)<br>
552
unsigned, otherwise</td>
554
<tr ALIGN="LEFT" VALIGN="TOP">
555
<td>unsigned short</td>
557
<td>int, if sizeof (short) < sizeof (int)<br>
558
unsigned, otherwise</td>
560
<tr ALIGN="LEFT" VALIGN="TOP">
563
<td>an unsigned type as large as a pointer</td>
568
<p>Pointers are converted to an unsigned type of the same size, even when that type is not
569
one of the integer types.</p>
571
<p>For example, the front end converts a signed short to a float by first converting it to
572
an int and then to a float. It converts an unsigned short to an int with a single <code>CVUI</code>i
573
conversion, when shorts are smaller than ints.</p>
575
<p>There are now signed and unsigned variants of <code>ASGN</code>, <code>INDIR</code>, <code>BCOM</code>,
576
<code>BOR</code>, <code>BXOR</code>, <code>BAND</code>, <code>ARG</code>, <code>CALL</code>,
577
and <code>RET</code> to simplify code generation on platforms that use different
578
instructions or register set for signed and unsigned operations. Likewise there are now
579
pointer variants of <code>ASGN</code>, <code>INDIR</code>, <code>ARG</code>, <code>CALL</code>,
580
and <code>RET</code>.</p>
582
<h2><a NAME="flags">5.6 Interface Flags</a></h2>
584
<pre>unsigned unsigned_char:1;</pre>
586
<p>tells the front end whether plain characters are signed or unsigned. If it's zero, char
587
is a signed type; otherwise, char is an unsigned type.</p>
589
<p>All the interface flags can be set by command-line options, e.g., <code>-Wf-unsigned_char=1</code>
590
causes plain characters to be unsigned.</p>
592
<h2><a NAME="definitions">5.8 Definitions</a></h2>
594
<p>The front end announces local variables by calling</p>
596
<pre>void (*local)(Symbol);</pre>
598
<p>It announces temporaries likewise; these have the symbol's <code>temporary</code> flag
599
set, which indicates that the symbol will be used only in the next call to <code>gen</code>.
600
If a temporary's <code>u.t.cse</code> field is nonnull, it points to the node that
601
computes the value assigned to the temporary; see page 346.</p>
603
<p>The front end calls</p>
605
<pre>void (*address)(Symbol p, Symbol q, long n);</pre>
607
<p>to initialize <code>q</code> to a symbol that represents an address of the form <em>x</em>+<code>n</code>,
608
where <em>x</em> is the address represented by <code>p</code> and the long integer <code>n</code>
609
is positive or negative.</p>
611
<h2><a NAME="constants">5.9 Constants</a></h2>
613
<p>The interface function</p>
615
<pre>void (*defconst)(int suffix, int size, Value v);</pre>
617
<p>initializes constants. defconst emits directives to define a cell and initialize it to
618
a constant value. v is the constant value, suffix identifies the type of the value, and
619
size is the size of the value in bytes. The value of suffix indicates which field of v
620
holds the value, as shown in the following table.</p>
623
<table BORDER="0" CELLPADDING="1" CELLSPACING="1">
625
<td><strong>suffix</strong></td>
626
<td><img SRC="/~drh/resources/dot_clear.gif" ALT="|" WIDTH="24" HEIGHT="1"></td>
627
<td><strong>v Field</strong></td>
628
<td><img SRC="/~drh/resources/dot_clear.gif" ALT="|" WIDTH="24" HEIGHT="1"></td>
629
<td><strong>size</strong></td>
632
<td><code>F</code></td>
634
<td><code>v.d</code></td>
636
<td>float, double, long double</td>
639
<td><code>I</code></td>
641
<td><code>v.i</code></td>
643
<td>signed char, signed short, signed int, signed long</td>
646
<td><code>U</code></td>
648
<td><code>v.u</code></td>
650
<td>unsigned char, unsigned short, unsigned int, unsigned long</td>
653
<td><code>P</code></td>
655
<td><code>v.p</code></td>
662
<p><code>defconst</code> must narrow <code>v.</code>x when <code>size</code> is less than <code>sizeof</code>
663
<code>v.</code>x; e.g., to emit an unsigned char, <code>defconst</code> should emit <code>(unsigned
666
<h2><a NAME="upcalls">5.12 Upcalls</a></h2>
668
<p>lcc 4.x uses standard I/O and its I/O functions have been changed accordingly. lcc
669
reads input from the standard input, emits code to the standard output, and writes
670
diagnostics to the standard error output. It uses <code>freopen</code> to redirect these
671
streams to explicit files, when necessary.</p>
673
<p><code>bp</code>, <code>outflush</code>, and <code>outs</code> have been eliminated.</p>
675
<pre>extern void fprint(FILE *f, const char *fmt, ...);
676
extern void print(const char *fmt, ...);</pre>
678
<p>print formatted data to file <code>f</code> (<code>fprint</code>) or the standard
679
output (<code>print</code>). These functions are like standard C's <code>printf</code> and
680
<code>fprintf</code>, but support only some of the standard conversion specifiers and do
681
not support flags, precision, and field-width specifications. They support the following
682
new conversion specifiers in addition to those described on page 99.</p>
685
<table BORDER="0" CELLPADDING="0" CELLSPACING="0">
687
<td><strong>Specifiers</strong></td>
688
<td><img SRC="/~drh/resources/dot_clear.gif" ALT="|" WIDTH="24" HEIGHT="1"></td>
689
<td><strong>Corresponding printf Specifiers</strong></td>
692
<td><code>%c</code></td>
694
<td><code>%c</code></td>
697
<td><code>%d %D</code></td>
699
<td><code>%d %ld</code></td>
702
<td><code>%u %U</code></td>
704
<td><code>%u %lu</code></td>
707
<td><code>%x %X</code></td>
709
<td><code>%x %lx</code></td>
712
<td><code>%f %e %g</code></td>
714
<td><code>%e %f %g</code></td>
716
<tr ALIGN="LEFT" VALIGN="TOP">
717
<td><code>%p</code></td>
719
<td>Converts the corresponding void * argument to unsigned long and prints it with the <code>printf</code>
720
<code>%#x</code> specifier or just <code>%x</code> when the argument is null.</td>
722
<tr ALIGN="LEFT" VALIGN="TOP">
723
<td><code>%I</code></td>
725
<td>Prints the number of spaces given by the corresponding argument.</td>
730
<pre>#define generic(op) ((op)&0x3F0)
731
#define specific(op) ((op)&0x3FF)</pre>
733
<p><code>generic(op)</code> returns the generic variant of <code>op</code>; that is,
734
without its type suffix and size indicator. <code>specific(op)</code> returns the
735
type-specific variant of <code>op</code>; that is, without its size indicator.</p>
737
<p><code>newconst</code> has been replaced by</p>
739
<pre>extern Symbol intconst(int n);</pre>
741
<p>which installs the integer constant <code>n</code> in the symbol table, if necessary,
742
and returns a pointer to the symbol-table entry.</p>
747
<a HREF="http://www.research.microsoft.com/~cwfraser/">Chris Fraser</a> / <a
748
HREF="mailto:cwfraser@microsoft.com">cwfraser@microsoft.com</a><br>
749
<a HREF="http://www.research.microsoft.com/~drh/">David Hanson</a> / <a
750
HREF="mailto:drh@microsoft.com">drh@microsoft.com</a><br>
751
$Revision: 145 $ $Date: 2001-10-17 16:53:10 -0500 (Wed, 17 Oct 2001) $