292
292
but for no longer, and so some primitive functions can be optimized to
293
293
avoid a copy in this case.
295
The @code{gp} bits are by definition `general purpose'. As of version
296
2.4.0 of R, bit 4 (i.e., the fifth bit) is turned on to mark S4 objects.
297
Bits 0-3 and bits 14-15 have been used previously as described below
298
(from detective work on the sources).
295
The @code{gp} bits are by definition `general purpose'. We label these
296
from 0 to 15. As of version 2.4.0 of R, bit 4 is turned on to mark S4
297
objects. Bits 0-3 and bits 14-15 have been used previously as described
298
below (from detective work on the sources).
307
307
@code{SEXPTYPE}s other than @code{NILSXP}, @code{SYMSXP} and
310
If we label the bits from 0, bits 14 and 15 of @code{gp} are used for
311
`fancy bindings'. Bit 14 is used to lock a binding or an environment,
312
and bit 15 is used to indicate an active binding. (For the definition
313
of an `active binding' see the header comments in file
314
@file{src/main/envir.c}.) Bit 15 is used for an environment to indicate
315
if it participates in the global cache.
310
Bits 14 and 15 of @code{gp} are used for `fancy bindings'. Bit 14 is
311
used to lock a binding or an environment, and bit 15 is used to indicate
312
an active binding. (For the definition of an `active binding' see the
313
header comments in file @file{src/main/envir.c}.) Bit 15 is used for an
314
environment to indicate if it participates in the global cache.
317
316
Almost all other uses seem to be only of bits 0 and 1, although one
318
317
reserves the first four bits.
357
356
As from @R{} 2.5.0, bits 2 and 3 for a @code{CHARSXP} are used to note
358
357
that it is known to be in Latin-1 and UTF-8 respectively. (These are not
359
358
usually set if it is also known to be in ASCII, since code does not need
360
to know the charset to handle ASCII strings.)
359
to know the charset to handle ASCII strings. From @R{} 2.8.0
360
it is guaranteed that they will not be set for CHARSXPs created by @R{}
361
itself.) As from @R{} 2.8.0 bit 5 is used to indicate that a CHARSXP
362
is hashed by its address, that is NA_STRING or in the CHARSXP cache.
362
365
@c Finally, @code{SETLEVELS} and @code{LEVELS} are used by that name for
363
366
@c the internal code for @code{terms.formula} to compute the @code{order}
1192
1195
encoding of the file it was reading). This lead to packages with data
1193
1196
in French encoded in Latin-1 in @code{.rda} files which could not be
1194
1197
read in other locales (and they would be able to be displayed in a
1195
French UTF-8 locale, if not in most Japanese locales).
1198
French UTF-8 locale, if not in non-UTF-8 Japanese locales).
1197
1200
@R{} 2.5.0 introduced a limited means to indicate the encoding of a
1198
1201
@code{CHARSXP} via two of the `general purpose' bits which are used to
1204
1207
Many (but not all) of the character manipulation functions will either
1205
1208
preserve the declaration or re-encode the character string.
1207
Eventually strings that refer to the OS such as file names will need to
1208
be passed through a wide-character interface on some OSes
1209
(e.g. Windows), which is to a large extent done as from @R{} 2.7.0.
1210
Strings that refer to the OS such as file names need to be passed
1211
through a wide-character interface on some OSes (e.g. Windows), which is
1212
to a large extent done as from @R{} 2.7.0.
1211
1214
When are character strings declared to be of known encoding? One way is
1212
1215
to do so directly via @code{Encoding}. The parser declares the encoding
1216
1219
@code{Encoding}.)
1218
1221
It is not necessary to declare the encoding of ASCII strings as they
1219
will work in any locale, but the overhead in doing so is small since
1220
they will never be passed to @command{iconv} for translation.
1222
will work in any locale. As from @R{} 2.8.0, ASCII strings should never
1223
have a marked encoding, as any encoding will be ignored when entering
1224
such strings into the @code{CHARSXP} cache.
1222
1226
The rationale behind considering only UTF-8 and Latin-1 is that most
1223
1227
systems are capable of producing UTF-8 strings and this is the nearest
1246
1250
UCS-2@footnote{or UTF-16 if support for surrogates is enabled in the OS,
1247
1251
which it is not normally at least for Western versions of Windows,
1248
1252
despite some claims to the contrary on the Microsoft site.} strings.
1249
@R{} (being written in standard C) will not work internally with UCS-2
1253
@R{} (being written in standard C) would not work internally with UCS-2
1250
1254
without extensive changes. As from @R{} 2.7.0 the @file{Rgui}
1251
1255
console@footnote{but not the GraphApp toolkit.} uses UCS-2 internally,
1252
1256
but communicates with the @R{} engine in the native encoding. To allow
1253
UTF-8 strings to be printed in UTF-8, an escape convention is used (see
1254
header @file{rgui_UTF8.h}) which is used by @code{cat}, @code{print} and
1257
UTF-8 strings to be printed in UTF-8 in @file{Rgui.exe}, an escape
1258
convention is used (see header @file{rgui_UTF8.h}) which is used by
1259
@code{cat}, @code{print} and autoprinting.
1257
1261
`Unicode' (UCS-2LE) files are common in the Windows world, and
1258
1262
@code{readLines} and @code{scan} will read them into UTF-8 strings on
1270
1274
@samp{NA_STRING} is not.
1272
1276
In @R{} 2.6.x and 2.7.x character strings created by @code{mkCharLen}
1273
are not part of the cache: these were intended to be those containing
1274
embedded nuls. As from @R{} 2.8.0 strings with embedded nuls will be
1277
were not part of the cache: these were intended to be those containing
1278
embedded nuls. As from @R{} 2.8.0 the cache can handle any content,
1279
although embedded nuls are now disallowed.
1277
1281
There are a few other ways in which @code{CHARSXP}s could or can escape
1278
1282
the cache. @code{CHARSXP}s reloaded from the @code{save} formats of
1279
1283
@R{} prior to 0.99.0 are not cached (since the code used is frozen and
1280
few examples still exist). Currently @code{CHARSXP}s are used to hold
1281
the finalizer function of a C finalizer (uncached). Finally, user code
1282
can create @code{CHARSXP}s via @code{allocString} (removed in @R 2.8.0)
1283
and @code{allocVector(CHARSXP ...)} (deprecated in @R 2.8.0).
1284
few examples still exist). Prior to @R{} 2.8.0, @code{CHARSXP}s were
1285
used to hold the finalizer function of a C finalizer (uncached) -- now
1286
@code{RAWSXP}s are used. Finally, user code could create uncached
1287
@code{CHARSXP}s via @code{allocString} (removed in @R 2.8.0) and
1288
@code{allocVector(CHARSXP ...)} (deprecated in @R 2.8.0, removed in @R{}
1292
The cache records the encoding of the string as well as the bytes: all
1293
requests to create a @code{CHARSXP} should be @emph{via} a call to
1294
@code{mkCharLenCE}. As from @R{} 2.8.0 any encoding given in
1295
@code{mkCharLenCE} call will be ignored if the string's bytes are all
1285
1299
@node Warnings and errors, S4 objects, The CHARSXP cache, R Internal Structures
1286
1300
@section Warnings and errors
1446
1460
@subsection Mechanics of S4 dispatch
1448
1462
This subsection does not discuss how S4 methods are chosen: see
1449
@uref{http://developer.r-project.org/howMethodsWork.pdf}.
1463
@uref{http://@/developer.@/r-project.org/howMethodsWork.pdf}.
1451
1465
For all but primitive functions, setting a method on an existing
1452
1466
function that is not itself S4 generic creates a new object in the
2209
2223
implemented in package @pkg{grid}.
2211
2225
Some notes on the changes for 1.4.0 can be found at
2212
@uref{http://www.stat.auckland.ac.nz/~paul/R/basegraph.html} and
2213
@uref{http://www.stat.auckland.ac.nz/~paul/R/graphicsChanges.html}.
2226
@uref{http://www.stat.auckland.ac.nz/@/~paul/@/R/basegraph.html} and
2227
@uref{http://www.stat.auckland.ac.nz/@/~paul/R/@/graphicsChanges.html}.
2215
2229
At the lowest level is a graphics device, which manages a plotting
2216
2230
surface (a screen window or a representation to be written to a file).
2340
2354
@node Device structures, Device capabilities, Graphics devices, Graphics devices
2341
2355
@subsection Device structures
2343
There are currently three types used internally which are pointers to
2344
structures related to graphics devices.
2357
There are two types used internally which are pointers to structures
2358
related to graphics devices.
2346
The @code{NewDevDesc} type@footnote{`new' in @R 1.4.0 and scheduled to
2347
be renamed in @R 2.8.0.} is a structure defined in the header file
2360
The @code{DevDesc} type@footnote{@code{NewDevDesc} from @R 1.4.0,
2361
renamed in @R 2.8.0.} is a structure defined in the header file
2348
2362
@file{R_ext/GraphicsDevice.h} (which is included by
2349
2363
@file{R_ext/GraphicsEngine.h}). This describes the physical
2350
2364
characteristics of a device, the capabilities of the device driver and
2354
2368
is a pointer to this type.
2356
2370
The relationship of device units to physical dimensions is set by the
2357
element @code{ipr} of the @code{NewDevDesc} structure: a @samp{double}
2371
element @code{ipr} of the @code{DevDesc} structure: a @samp{double}
2358
2372
array of length 2.
2380
2394
So this is essentially a device structure plus information about the
2381
2395
device maintained by the graphics engine and normally@footnote{It is
2382
2396
possible for the device to find the @code{GEDevDesc} which points to its
2383
@code{NewDevSec}, and this is done often enough that there is a
2397
@code{DevDesc}, and this is done often enough that there is a
2384
2398
convenience function @code{desc2GEDesc} to do so.} visible to the engine
2385
2399
and not to the device. Type @code{pGEDevDesc} is a pointer to this
2388
The third type is @code{pGEDev} which is an opaque pointer to a
2389
@code{GEDevDesc} structure. (In earlier code you will also find
2390
@code{DevDesc *}, which in post-1.4.0 versions of @R{} was,
2391
confusinngly, also an opaque pointer to a @code{GEDevDesc} structure.)
2392
This will no longer exist in @R{} 2.8.0.
2394
2402
The graphics engine maintains an array of devices, as pointers to
2395
2403
@code{GEDevDesc} structures. The array is of size 64 but the first
2396
2404
element is always occupied by the @code{"null device"} and the final
2413
2421
BEGIN_SUSPEND_INTERRUPTS @{
2415
2423
/* Allocate and initialize the device driver data */
2416
if (!(dev = (pDevDesc) calloc(1, sizeof(NewDevDesc))))
2424
if (!(dev = (pDevDesc) calloc(1, sizeof(DevDesc))))
2417
2425
return 0; /* or error() */
2418
2426
/* set up device driver or free 'dev' and error() */
2419
2427
gdd = GEcreateDevDesc(dev);
2421
2429
@} END_SUSPEND_INTERRUPTS;
2424
The @code{NewDevDesc} structure contains a @code{void *} pointer
2432
The @code{DevDesc} structure contains a @code{void *} pointer
2425
2433
@samp{deviceSpecific} which is used to store data specific to the
2426
2434
device. Setting up the device driver includes initializing all the
2427
non-zero elements of the @code{NewDevDesc} structure.
2435
non-zero elements of the @code{DevDesc} structure.
2429
2437
Note that the device structure is zeroed when allocated: this provides
2430
2438
some protection against future expansion of the structure since the
2534
2542
The @emph{interpretation} of @samp{c} depends on the locale. Using
2535
2543
@code{c = 0} used to give an indication of the size of the font: it
2536
2544
often returned the measurements for character @code{"M"}---however it is
2537
not longer used as from @R{} 2.7.0. In a single-byte locale values
2545
no longer used as from @R{} 2.7.0. In a single-byte locale values
2538
2546
@code{32...255} indicate the corresponding character in the locale (if
2539
2547
present). For the symbol font (as used by @samp{graphics::par(font=5)},
2540
2548
@samp{grid::gpar(fontface=5}) and by `plotmath'), values @code{32...126,
2744
2752
@node X11(), windows(), Specific devices, Specific devices
2745
2753
@subsubsection X11()
2747
The @code{X11()} device dates back to the mid 1990's and was written
2748
then in @code{Xlib}, the most basic X11 toolkit. It has since
2749
optionally made use of a few features from other toolkits: @code{libXt} is
2750
used to read X11 resources, and @code{libXmu} is used in the handling of
2751
clipboard selections.
2755
The @code{X11(type="Xlib")} device dates back to the mid 1990's and was
2756
written then in @code{Xlib}, the most basic X11 toolkit. It has since
2757
optionally made use of a few features from other toolkits: @code{libXt}
2758
is used to read X11 resources, and @code{libXmu} is used in the handling
2759
of clipboard selections.
2753
2761
Using basic @code{Xlib} code makes drawing fast, but is limiting. There
2754
2762
is no support of translucent colours (that came in the @code{Xrender}
2791
2799
@subsubsection windows()
2793
2801
The @code{windows()} device is a family of devices: it supports plotting
2794
to Windows (enhanced) metafiles, @code{BMP}, @code{JPEG} and @code{PNG}
2795
files as well as to Windows printers.
2802
to Windows (enhanced) metafiles, @code{BMP}, @code{JPEG}, @code{PNG} and
2803
@code{TIFF} files as well as to Windows printers.
2797
2805
In most of these cases the primary plotting is to a bitmap: this is used
2798
2806
for the (default) buffering of the screen device, which also enables the
2799
current plot to be saved to BMP, JPEG or PNG (it is the internal bitmap
2800
which is copied to the file in the appropriate format).
2807
current plot to be saved to BMP, JPEG, PNG or TIFF (it is the internal
2808
bitmap which is copied to the file in the appropriate format).
2802
2810
The device units are pixels (logical ones on a metafile device).
2930
2938
@file{graphics.c}, which in turn call the graphics engine (whose
2931
2939
functions almost all have names starting with @code{GE}).
2933
Again for historical reasons, @file{Rgraphics.h} was a public header
2934
which will be taken private in @R{} 2.8.0.
2936
2941
A large part of the infrastructure of the base graphics subsystem are
2937
2942
the graphics parameters (as set/read by @code{par()}). These are stored
2938
2943
in a @code{GPar} structure declared in the private header