6
******************************
7
Extending Python with C or C++
8
******************************
10
It is quite easy to add new built-in modules to Python, if you know how to
11
program in C. Such :dfn:`extension modules` can do two things that can't be
12
done directly in Python: they can implement new built-in object types, and they
13
can call C library functions and system calls.
15
To support extensions, the Python API (Application Programmers Interface)
16
defines a set of functions, macros and variables that provide access to most
17
aspects of the Python run-time system. The Python API is incorporated in a C
18
source file by including the header ``"Python.h"``.
20
The compilation of an extension module depends on its intended use as well as on
21
your system setup; details are given in later chapters.
25
The C extension interface is specific to CPython, and extension modules do
26
not work on other Python implementations. In many cases, it is possible to
27
avoid writing C extensions and preserve portability to other implementations.
28
For example, if your use case is calling C library functions or system calls,
29
you should consider using the :mod:`ctypes` module or the `cffi
30
<http://cffi.readthedocs.org>`_ library rather than writing custom C code.
31
These modules let you write Python code to interface with C code and are more
32
portable between implementations of Python than writing and compiling a C
36
.. _extending-simpleexample:
41
Let's create an extension module called ``spam`` (the favorite food of Monty
42
Python fans...) and let's say we want to create a Python interface to the C
43
library function :c:func:`system`. [#]_ This function takes a null-terminated
44
character string as argument and returns an integer. We want this function to
45
be callable from Python as follows::
48
>>> status = spam.system("ls -l")
50
Begin by creating a file :file:`spammodule.c`. (Historically, if a module is
51
called ``spam``, the C file containing its implementation is called
52
:file:`spammodule.c`; if the module name is very long, like ``spammify``, the
53
module name can be just :file:`spammify.c`.)
55
The first line of our file can be::
59
which pulls in the Python API (you can add a comment describing the purpose of
60
the module and a copyright notice if you like).
64
Since Python may define some pre-processor definitions which affect the standard
65
headers on some systems, you *must* include :file:`Python.h` before any standard
68
All user-visible symbols defined by :file:`Python.h` have a prefix of ``Py`` or
69
``PY``, except those defined in standard header files. For convenience, and
70
since they are used extensively by the Python interpreter, ``"Python.h"``
71
includes a few standard header files: ``<stdio.h>``, ``<string.h>``,
72
``<errno.h>``, and ``<stdlib.h>``. If the latter header file does not exist on
73
your system, it declares the functions :c:func:`malloc`, :c:func:`free` and
74
:c:func:`realloc` directly.
76
The next thing we add to our module file is the C function that will be called
77
when the Python expression ``spam.system(string)`` is evaluated (we'll see
78
shortly how it ends up being called)::
81
spam_system(PyObject *self, PyObject *args)
86
if (!PyArg_ParseTuple(args, "s", &command))
88
sts = system(command);
89
return Py_BuildValue("i", sts);
92
There is a straightforward translation from the argument list in Python (for
93
example, the single expression ``"ls -l"``) to the arguments passed to the C
94
function. The C function always has two arguments, conventionally named *self*
97
For module functions, the *self* argument is *NULL* or a pointer selected while
98
initializing the module (see :c:func:`Py_InitModule4`). For a method, it would
99
point to the object instance.
101
The *args* argument will be a pointer to a Python tuple object containing the
102
arguments. Each item of the tuple corresponds to an argument in the call's
103
argument list. The arguments are Python objects --- in order to do anything
104
with them in our C function we have to convert them to C values. The function
105
:c:func:`PyArg_ParseTuple` in the Python API checks the argument types and
106
converts them to C values. It uses a template string to determine the required
107
types of the arguments as well as the types of the C variables into which to
108
store the converted values. More about this later.
110
:c:func:`PyArg_ParseTuple` returns true (nonzero) if all arguments have the right
111
type and its components have been stored in the variables whose addresses are
112
passed. It returns false (zero) if an invalid argument list was passed. In the
113
latter case it also raises an appropriate exception so the calling function can
114
return *NULL* immediately (as we saw in the example).
117
.. _extending-errors:
119
Intermezzo: Errors and Exceptions
120
=================================
122
An important convention throughout the Python interpreter is the following: when
123
a function fails, it should set an exception condition and return an error value
124
(usually a *NULL* pointer). Exceptions are stored in a static global variable
125
inside the interpreter; if this variable is *NULL* no exception has occurred. A
126
second global variable stores the "associated value" of the exception (the
127
second argument to :keyword:`raise`). A third variable contains the stack
128
traceback in case the error originated in Python code. These three variables
129
are the C equivalents of the Python variables ``sys.exc_type``,
130
``sys.exc_value`` and ``sys.exc_traceback`` (see the section on module
131
:mod:`sys` in the Python Library Reference). It is important to know about them
132
to understand how errors are passed around.
134
The Python API defines a number of functions to set various types of exceptions.
136
The most common one is :c:func:`PyErr_SetString`. Its arguments are an exception
137
object and a C string. The exception object is usually a predefined object like
138
:c:data:`PyExc_ZeroDivisionError`. The C string indicates the cause of the error
139
and is converted to a Python string object and stored as the "associated value"
142
Another useful function is :c:func:`PyErr_SetFromErrno`, which only takes an
143
exception argument and constructs the associated value by inspection of the
144
global variable :c:data:`errno`. The most general function is
145
:c:func:`PyErr_SetObject`, which takes two object arguments, the exception and
146
its associated value. You don't need to :c:func:`Py_INCREF` the objects passed
147
to any of these functions.
149
You can test non-destructively whether an exception has been set with
150
:c:func:`PyErr_Occurred`. This returns the current exception object, or *NULL*
151
if no exception has occurred. You normally don't need to call
152
:c:func:`PyErr_Occurred` to see whether an error occurred in a function call,
153
since you should be able to tell from the return value.
155
When a function *f* that calls another function *g* detects that the latter
156
fails, *f* should itself return an error value (usually *NULL* or ``-1``). It
157
should *not* call one of the :c:func:`PyErr_\*` functions --- one has already
158
been called by *g*. *f*'s caller is then supposed to also return an error
159
indication to *its* caller, again *without* calling :c:func:`PyErr_\*`, and so on
160
--- the most detailed cause of the error was already reported by the function
161
that first detected it. Once the error reaches the Python interpreter's main
162
loop, this aborts the currently executing Python code and tries to find an
163
exception handler specified by the Python programmer.
165
(There are situations where a module can actually give a more detailed error
166
message by calling another :c:func:`PyErr_\*` function, and in such cases it is
167
fine to do so. As a general rule, however, this is not necessary, and can cause
168
information about the cause of the error to be lost: most operations can fail
169
for a variety of reasons.)
171
To ignore an exception set by a function call that failed, the exception
172
condition must be cleared explicitly by calling :c:func:`PyErr_Clear`. The only
173
time C code should call :c:func:`PyErr_Clear` is if it doesn't want to pass the
174
error on to the interpreter but wants to handle it completely by itself
175
(possibly by trying something else, or pretending nothing went wrong).
177
Every failing :c:func:`malloc` call must be turned into an exception --- the
178
direct caller of :c:func:`malloc` (or :c:func:`realloc`) must call
179
:c:func:`PyErr_NoMemory` and return a failure indicator itself. All the
180
object-creating functions (for example, :c:func:`PyInt_FromLong`) already do
181
this, so this note is only relevant to those who call :c:func:`malloc` directly.
183
Also note that, with the important exception of :c:func:`PyArg_ParseTuple` and
184
friends, functions that return an integer status usually return a positive value
185
or zero for success and ``-1`` for failure, like Unix system calls.
187
Finally, be careful to clean up garbage (by making :c:func:`Py_XDECREF` or
188
:c:func:`Py_DECREF` calls for objects you have already created) when you return
191
The choice of which exception to raise is entirely yours. There are predeclared
192
C objects corresponding to all built-in Python exceptions, such as
193
:c:data:`PyExc_ZeroDivisionError`, which you can use directly. Of course, you
194
should choose exceptions wisely --- don't use :c:data:`PyExc_TypeError` to mean
195
that a file couldn't be opened (that should probably be :c:data:`PyExc_IOError`).
196
If something's wrong with the argument list, the :c:func:`PyArg_ParseTuple`
197
function usually raises :c:data:`PyExc_TypeError`. If you have an argument whose
198
value must be in a particular range or must satisfy other conditions,
199
:c:data:`PyExc_ValueError` is appropriate.
201
You can also define a new exception that is unique to your module. For this, you
202
usually declare a static object variable at the beginning of your file::
204
static PyObject *SpamError;
206
and initialize it in your module's initialization function (:c:func:`initspam`)
207
with an exception object (leaving out the error checking for now)::
214
m = Py_InitModule("spam", SpamMethods);
218
SpamError = PyErr_NewException("spam.error", NULL, NULL);
219
Py_INCREF(SpamError);
220
PyModule_AddObject(m, "error", SpamError);
223
Note that the Python name for the exception object is :exc:`spam.error`. The
224
:c:func:`PyErr_NewException` function may create a class with the base class
225
being :exc:`Exception` (unless another class is passed in instead of *NULL*),
226
described in :ref:`bltin-exceptions`.
228
Note also that the :c:data:`SpamError` variable retains a reference to the newly
229
created exception class; this is intentional! Since the exception could be
230
removed from the module by external code, an owned reference to the class is
231
needed to ensure that it will not be discarded, causing :c:data:`SpamError` to
232
become a dangling pointer. Should it become a dangling pointer, C code which
233
raises the exception could cause a core dump or other unintended side effects.
235
We discuss the use of ``PyMODINIT_FUNC`` as a function return type later in this
238
The :exc:`spam.error` exception can be raised in your extension module using a
239
call to :c:func:`PyErr_SetString` as shown below::
242
spam_system(PyObject *self, PyObject *args)
247
if (!PyArg_ParseTuple(args, "s", &command))
249
sts = system(command);
251
PyErr_SetString(SpamError, "System command failed");
254
return PyLong_FromLong(sts);
263
Going back to our example function, you should now be able to understand this
266
if (!PyArg_ParseTuple(args, "s", &command))
269
It returns *NULL* (the error indicator for functions returning object pointers)
270
if an error is detected in the argument list, relying on the exception set by
271
:c:func:`PyArg_ParseTuple`. Otherwise the string value of the argument has been
272
copied to the local variable :c:data:`command`. This is a pointer assignment and
273
you are not supposed to modify the string to which it points (so in Standard C,
274
the variable :c:data:`command` should properly be declared as ``const char
277
The next statement is a call to the Unix function :c:func:`system`, passing it
278
the string we just got from :c:func:`PyArg_ParseTuple`::
280
sts = system(command);
282
Our :func:`spam.system` function must return the value of :c:data:`sts` as a
283
Python object. This is done using the function :c:func:`Py_BuildValue`, which is
284
something like the inverse of :c:func:`PyArg_ParseTuple`: it takes a format
285
string and an arbitrary number of C values, and returns a new Python object.
286
More info on :c:func:`Py_BuildValue` is given later. ::
288
return Py_BuildValue("i", sts);
290
In this case, it will return an integer object. (Yes, even integers are objects
291
on the heap in Python!)
293
If you have a C function that returns no useful argument (a function returning
294
:c:type:`void`), the corresponding Python function must return ``None``. You
295
need this idiom to do so (which is implemented by the :c:macro:`Py_RETURN_NONE`
301
:c:data:`Py_None` is the C name for the special Python object ``None``. It is a
302
genuine Python object rather than a *NULL* pointer, which means "error" in most
303
contexts, as we have seen.
308
The Module's Method Table and Initialization Function
309
=====================================================
311
I promised to show how :c:func:`spam_system` is called from Python programs.
312
First, we need to list its name and address in a "method table"::
314
static PyMethodDef SpamMethods[] = {
316
{"system", spam_system, METH_VARARGS,
317
"Execute a shell command."},
319
{NULL, NULL, 0, NULL} /* Sentinel */
322
Note the third entry (``METH_VARARGS``). This is a flag telling the interpreter
323
the calling convention to be used for the C function. It should normally always
324
be ``METH_VARARGS`` or ``METH_VARARGS | METH_KEYWORDS``; a value of ``0`` means
325
that an obsolete variant of :c:func:`PyArg_ParseTuple` is used.
327
When using only ``METH_VARARGS``, the function should expect the Python-level
328
parameters to be passed in as a tuple acceptable for parsing via
329
:c:func:`PyArg_ParseTuple`; more information on this function is provided below.
331
The :const:`METH_KEYWORDS` bit may be set in the third field if keyword
332
arguments should be passed to the function. In this case, the C function should
333
accept a third ``PyObject *`` parameter which will be a dictionary of keywords.
334
Use :c:func:`PyArg_ParseTupleAndKeywords` to parse the arguments to such a
337
The method table must be passed to the interpreter in the module's
338
initialization function. The initialization function must be named
339
:c:func:`initname`, where *name* is the name of the module, and should be the
340
only non-\ ``static`` item defined in the module file::
345
(void) Py_InitModule("spam", SpamMethods);
348
Note that PyMODINIT_FUNC declares the function as ``void`` return type,
349
declares any special linkage declarations required by the platform, and for C++
350
declares the function as ``extern "C"``.
352
When the Python program imports module :mod:`spam` for the first time,
353
:c:func:`initspam` is called. (See below for comments about embedding Python.)
354
It calls :c:func:`Py_InitModule`, which creates a "module object" (which is
355
inserted in the dictionary ``sys.modules`` under the key ``"spam"``), and
356
inserts built-in function objects into the newly created module based upon the
357
table (an array of :c:type:`PyMethodDef` structures) that was passed as its
358
second argument. :c:func:`Py_InitModule` returns a pointer to the module object
359
that it creates (which is unused here). It may abort with a fatal error for
360
certain errors, or return *NULL* if the module could not be initialized
363
When embedding Python, the :c:func:`initspam` function is not called
364
automatically unless there's an entry in the :c:data:`_PyImport_Inittab` table.
365
The easiest way to handle this is to statically initialize your
366
statically-linked modules by directly calling :c:func:`initspam` after the call
367
to :c:func:`Py_Initialize`::
370
main(int argc, char *argv[])
372
/* Pass argv[0] to the Python interpreter */
373
Py_SetProgramName(argv[0]);
375
/* Initialize the Python interpreter. Required. */
378
/* Add a static module */
383
An example may be found in the file :file:`Demo/embed/demo.c` in the Python
388
Removing entries from ``sys.modules`` or importing compiled modules into
389
multiple interpreters within a process (or following a :c:func:`fork` without an
390
intervening :c:func:`exec`) can create problems for some extension modules.
391
Extension module authors should exercise caution when initializing internal data
392
structures. Note also that the :func:`reload` function can be used with
393
extension modules, and will call the module initialization function
394
(:c:func:`initspam` in the example), but will not load the module again if it was
395
loaded from a dynamically loadable object file (:file:`.so` on Unix,
396
:file:`.dll` on Windows).
398
A more substantial example module is included in the Python source distribution
399
as :file:`Modules/xxmodule.c`. This file may be used as a template or simply
405
Compilation and Linkage
406
=======================
408
There are two more things to do before you can use your new extension: compiling
409
and linking it with the Python system. If you use dynamic loading, the details
410
may depend on the style of dynamic loading your system uses; see the chapters
411
about building extension modules (chapter :ref:`building`) and additional
412
information that pertains only to building on Windows (chapter
413
:ref:`building-on-windows`) for more information about this.
415
If you can't use dynamic loading, or if you want to make your module a permanent
416
part of the Python interpreter, you will have to change the configuration setup
417
and rebuild the interpreter. Luckily, this is very simple on Unix: just place
418
your file (:file:`spammodule.c` for example) in the :file:`Modules/` directory
419
of an unpacked source distribution, add a line to the file
420
:file:`Modules/Setup.local` describing your file::
424
and rebuild the interpreter by running :program:`make` in the toplevel
425
directory. You can also run :program:`make` in the :file:`Modules/`
426
subdirectory, but then you must first rebuild :file:`Makefile` there by running
427
':program:`make` Makefile'. (This is necessary each time you change the
430
If your module requires additional libraries to link with, these can be listed
431
on the line in the configuration file as well, for instance::
433
spam spammodule.o -lX11
438
Calling Python Functions from C
439
===============================
441
So far we have concentrated on making C functions callable from Python. The
442
reverse is also useful: calling Python functions from C. This is especially the
443
case for libraries that support so-called "callback" functions. If a C
444
interface makes use of callbacks, the equivalent Python often needs to provide a
445
callback mechanism to the Python programmer; the implementation will require
446
calling the Python callback functions from a C callback. Other uses are also
449
Fortunately, the Python interpreter is easily called recursively, and there is a
450
standard interface to call a Python function. (I won't dwell on how to call the
451
Python parser with a particular string as input --- if you're interested, have a
452
look at the implementation of the :option:`-c` command line option in
453
:file:`Modules/main.c` from the Python source code.)
455
Calling a Python function is easy. First, the Python program must somehow pass
456
you the Python function object. You should provide a function (or some other
457
interface) to do this. When this function is called, save a pointer to the
458
Python function object (be careful to :c:func:`Py_INCREF` it!) in a global
459
variable --- or wherever you see fit. For example, the following function might
460
be part of a module definition::
462
static PyObject *my_callback = NULL;
465
my_set_callback(PyObject *dummy, PyObject *args)
467
PyObject *result = NULL;
470
if (PyArg_ParseTuple(args, "O:set_callback", &temp)) {
471
if (!PyCallable_Check(temp)) {
472
PyErr_SetString(PyExc_TypeError, "parameter must be callable");
475
Py_XINCREF(temp); /* Add a reference to new callback */
476
Py_XDECREF(my_callback); /* Dispose of previous callback */
477
my_callback = temp; /* Remember new callback */
478
/* Boilerplate to return "None" */
485
This function must be registered with the interpreter using the
486
:const:`METH_VARARGS` flag; this is described in section :ref:`methodtable`. The
487
:c:func:`PyArg_ParseTuple` function and its arguments are documented in section
490
The macros :c:func:`Py_XINCREF` and :c:func:`Py_XDECREF` increment/decrement the
491
reference count of an object and are safe in the presence of *NULL* pointers
492
(but note that *temp* will not be *NULL* in this context). More info on them
493
in section :ref:`refcounts`.
495
.. index:: single: PyObject_CallObject()
497
Later, when it is time to call the function, you call the C function
498
:c:func:`PyObject_CallObject`. This function has two arguments, both pointers to
499
arbitrary Python objects: the Python function, and the argument list. The
500
argument list must always be a tuple object, whose length is the number of
501
arguments. To call the Python function with no arguments, pass in NULL, or
502
an empty tuple; to call it with one argument, pass a singleton tuple.
503
:c:func:`Py_BuildValue` returns a tuple when its format string consists of zero
504
or more format codes between parentheses. For example::
512
/* Time to call the callback */
513
arglist = Py_BuildValue("(i)", arg);
514
result = PyObject_CallObject(my_callback, arglist);
517
:c:func:`PyObject_CallObject` returns a Python object pointer: this is the return
518
value of the Python function. :c:func:`PyObject_CallObject` is
519
"reference-count-neutral" with respect to its arguments. In the example a new
520
tuple was created to serve as the argument list, which is :c:func:`Py_DECREF`\
521
-ed immediately after the :c:func:`PyObject_CallObject` call.
523
The return value of :c:func:`PyObject_CallObject` is "new": either it is a brand
524
new object, or it is an existing object whose reference count has been
525
incremented. So, unless you want to save it in a global variable, you should
526
somehow :c:func:`Py_DECREF` the result, even (especially!) if you are not
527
interested in its value.
529
Before you do this, however, it is important to check that the return value
530
isn't *NULL*. If it is, the Python function terminated by raising an exception.
531
If the C code that called :c:func:`PyObject_CallObject` is called from Python, it
532
should now return an error indication to its Python caller, so the interpreter
533
can print a stack trace, or the calling Python code can handle the exception.
534
If this is not possible or desirable, the exception should be cleared by calling
535
:c:func:`PyErr_Clear`. For example::
538
return NULL; /* Pass error back */
542
Depending on the desired interface to the Python callback function, you may also
543
have to provide an argument list to :c:func:`PyObject_CallObject`. In some cases
544
the argument list is also provided by the Python program, through the same
545
interface that specified the callback function. It can then be saved and used
546
in the same manner as the function object. In other cases, you may have to
547
construct a new tuple to pass as the argument list. The simplest way to do this
548
is to call :c:func:`Py_BuildValue`. For example, if you want to pass an integral
549
event code, you might use the following code::
553
arglist = Py_BuildValue("(l)", eventcode);
554
result = PyObject_CallObject(my_callback, arglist);
557
return NULL; /* Pass error back */
558
/* Here maybe use the result */
561
Note the placement of ``Py_DECREF(arglist)`` immediately after the call, before
562
the error check! Also note that strictly speaking this code is not complete:
563
:c:func:`Py_BuildValue` may run out of memory, and this should be checked.
565
You may also call a function with keyword arguments by using
566
:c:func:`PyObject_Call`, which supports arguments and keyword arguments. As in
567
the above example, we use :c:func:`Py_BuildValue` to construct the dictionary. ::
571
dict = Py_BuildValue("{s:i}", "name", val);
572
result = PyObject_Call(my_callback, NULL, dict);
575
return NULL; /* Pass error back */
576
/* Here maybe use the result */
582
Extracting Parameters in Extension Functions
583
============================================
585
.. index:: single: PyArg_ParseTuple()
587
The :c:func:`PyArg_ParseTuple` function is declared as follows::
589
int PyArg_ParseTuple(PyObject *arg, char *format, ...);
591
The *arg* argument must be a tuple object containing an argument list passed
592
from Python to a C function. The *format* argument must be a format string,
593
whose syntax is explained in :ref:`arg-parsing` in the Python/C API Reference
594
Manual. The remaining arguments must be addresses of variables whose type is
595
determined by the format string.
597
Note that while :c:func:`PyArg_ParseTuple` checks that the Python arguments have
598
the required types, it cannot check the validity of the addresses of C variables
599
passed to the call: if you make mistakes there, your code will probably crash or
600
at least overwrite random bits in memory. So be careful!
602
Note that any Python object references which are provided to the caller are
603
*borrowed* references; do not decrement their reference count!
613
ok = PyArg_ParseTuple(args, ""); /* No arguments */
614
/* Python call: f() */
618
ok = PyArg_ParseTuple(args, "s", &s); /* A string */
619
/* Possible Python call: f('whoops!') */
623
ok = PyArg_ParseTuple(args, "lls", &k, &l, &s); /* Two longs and a string */
624
/* Possible Python call: f(1, 2, 'three') */
628
ok = PyArg_ParseTuple(args, "(ii)s#", &i, &j, &s, &size);
629
/* A pair of ints and a string, whose size is also returned */
630
/* Possible Python call: f((1, 2), 'three') */
636
const char *mode = "r";
638
ok = PyArg_ParseTuple(args, "s|si", &file, &mode, &bufsize);
639
/* A string, and optionally another string and an integer */
640
/* Possible Python calls:
643
f('spam', 'wb', 100000) */
649
int left, top, right, bottom, h, v;
650
ok = PyArg_ParseTuple(args, "((ii)(ii))(ii)",
651
&left, &top, &right, &bottom, &h, &v);
652
/* A rectangle and a point */
653
/* Possible Python call:
654
f(((0, 0), (400, 300)), (10, 10)) */
661
ok = PyArg_ParseTuple(args, "D:myfunction", &c);
662
/* a complex, also providing a function name for errors */
663
/* Possible Python call: myfunction(1+2j) */
667
.. _parsetupleandkeywords:
669
Keyword Parameters for Extension Functions
670
==========================================
672
.. index:: single: PyArg_ParseTupleAndKeywords()
674
The :c:func:`PyArg_ParseTupleAndKeywords` function is declared as follows::
676
int PyArg_ParseTupleAndKeywords(PyObject *arg, PyObject *kwdict,
677
char *format, char *kwlist[], ...);
679
The *arg* and *format* parameters are identical to those of the
680
:c:func:`PyArg_ParseTuple` function. The *kwdict* parameter is the dictionary of
681
keywords received as the third parameter from the Python runtime. The *kwlist*
682
parameter is a *NULL*-terminated list of strings which identify the parameters;
683
the names are matched with the type information from *format* from left to
684
right. On success, :c:func:`PyArg_ParseTupleAndKeywords` returns true, otherwise
685
it returns false and raises an appropriate exception.
689
Nested tuples cannot be parsed when using keyword arguments! Keyword parameters
690
passed in which are not present in the *kwlist* will cause :exc:`TypeError` to
693
.. index:: single: Philbrick, Geoff
695
Here is an example module which uses keywords, based on an example by Geoff
696
Philbrick (philbrick@hks.com)::
701
keywdarg_parrot(PyObject *self, PyObject *args, PyObject *keywds)
704
char *state = "a stiff";
705
char *action = "voom";
706
char *type = "Norwegian Blue";
708
static char *kwlist[] = {"voltage", "state", "action", "type", NULL};
710
if (!PyArg_ParseTupleAndKeywords(args, keywds, "i|sss", kwlist,
711
&voltage, &state, &action, &type))
714
printf("-- This parrot wouldn't %s if you put %i Volts through it.\n",
716
printf("-- Lovely plumage, the %s -- It's %s!\n", type, state);
723
static PyMethodDef keywdarg_methods[] = {
724
/* The cast of the function is necessary since PyCFunction values
725
* only take two PyObject* parameters, and keywdarg_parrot() takes
728
{"parrot", (PyCFunction)keywdarg_parrot, METH_VARARGS | METH_KEYWORDS,
729
"Print a lovely skit to standard output."},
730
{NULL, NULL, 0, NULL} /* sentinel */
738
/* Create the module and add the functions */
739
Py_InitModule("keywdarg", keywdarg_methods);
745
Building Arbitrary Values
746
=========================
748
This function is the counterpart to :c:func:`PyArg_ParseTuple`. It is declared
751
PyObject *Py_BuildValue(char *format, ...);
753
It recognizes a set of format units similar to the ones recognized by
754
:c:func:`PyArg_ParseTuple`, but the arguments (which are input to the function,
755
not output) must not be pointers, just values. It returns a new Python object,
756
suitable for returning from a C function called from Python.
758
One difference with :c:func:`PyArg_ParseTuple`: while the latter requires its
759
first argument to be a tuple (since Python argument lists are always represented
760
as tuples internally), :c:func:`Py_BuildValue` does not always build a tuple. It
761
builds a tuple only if its format string contains two or more format units. If
762
the format string is empty, it returns ``None``; if it contains exactly one
763
format unit, it returns whatever object is described by that format unit. To
764
force it to return a tuple of size 0 or one, parenthesize the format string.
766
Examples (to the left the call, to the right the resulting Python value)::
768
Py_BuildValue("") None
769
Py_BuildValue("i", 123) 123
770
Py_BuildValue("iii", 123, 456, 789) (123, 456, 789)
771
Py_BuildValue("s", "hello") 'hello'
772
Py_BuildValue("ss", "hello", "world") ('hello', 'world')
773
Py_BuildValue("s#", "hello", 4) 'hell'
774
Py_BuildValue("()") ()
775
Py_BuildValue("(i)", 123) (123,)
776
Py_BuildValue("(ii)", 123, 456) (123, 456)
777
Py_BuildValue("(i,i)", 123, 456) (123, 456)
778
Py_BuildValue("[i,i]", 123, 456) [123, 456]
779
Py_BuildValue("{s:i,s:i}",
780
"abc", 123, "def", 456) {'abc': 123, 'def': 456}
781
Py_BuildValue("((ii)(ii)) (ii)",
782
1, 2, 3, 4, 5, 6) (((1, 2), (3, 4)), (5, 6))
790
In languages like C or C++, the programmer is responsible for dynamic allocation
791
and deallocation of memory on the heap. In C, this is done using the functions
792
:c:func:`malloc` and :c:func:`free`. In C++, the operators ``new`` and
793
``delete`` are used with essentially the same meaning and we'll restrict
794
the following discussion to the C case.
796
Every block of memory allocated with :c:func:`malloc` should eventually be
797
returned to the pool of available memory by exactly one call to :c:func:`free`.
798
It is important to call :c:func:`free` at the right time. If a block's address
799
is forgotten but :c:func:`free` is not called for it, the memory it occupies
800
cannot be reused until the program terminates. This is called a :dfn:`memory
801
leak`. On the other hand, if a program calls :c:func:`free` for a block and then
802
continues to use the block, it creates a conflict with re-use of the block
803
through another :c:func:`malloc` call. This is called :dfn:`using freed memory`.
804
It has the same bad consequences as referencing uninitialized data --- core
805
dumps, wrong results, mysterious crashes.
807
Common causes of memory leaks are unusual paths through the code. For instance,
808
a function may allocate a block of memory, do some calculation, and then free
809
the block again. Now a change in the requirements for the function may add a
810
test to the calculation that detects an error condition and can return
811
prematurely from the function. It's easy to forget to free the allocated memory
812
block when taking this premature exit, especially when it is added later to the
813
code. Such leaks, once introduced, often go undetected for a long time: the
814
error exit is taken only in a small fraction of all calls, and most modern
815
machines have plenty of virtual memory, so the leak only becomes apparent in a
816
long-running process that uses the leaking function frequently. Therefore, it's
817
important to prevent leaks from happening by having a coding convention or
818
strategy that minimizes this kind of errors.
820
Since Python makes heavy use of :c:func:`malloc` and :c:func:`free`, it needs a
821
strategy to avoid memory leaks as well as the use of freed memory. The chosen
822
method is called :dfn:`reference counting`. The principle is simple: every
823
object contains a counter, which is incremented when a reference to the object
824
is stored somewhere, and which is decremented when a reference to it is deleted.
825
When the counter reaches zero, the last reference to the object has been deleted
826
and the object is freed.
828
An alternative strategy is called :dfn:`automatic garbage collection`.
829
(Sometimes, reference counting is also referred to as a garbage collection
830
strategy, hence my use of "automatic" to distinguish the two.) The big
831
advantage of automatic garbage collection is that the user doesn't need to call
832
:c:func:`free` explicitly. (Another claimed advantage is an improvement in speed
833
or memory usage --- this is no hard fact however.) The disadvantage is that for
834
C, there is no truly portable automatic garbage collector, while reference
835
counting can be implemented portably (as long as the functions :c:func:`malloc`
836
and :c:func:`free` are available --- which the C Standard guarantees). Maybe some
837
day a sufficiently portable automatic garbage collector will be available for C.
838
Until then, we'll have to live with reference counts.
840
While Python uses the traditional reference counting implementation, it also
841
offers a cycle detector that works to detect reference cycles. This allows
842
applications to not worry about creating direct or indirect circular references;
843
these are the weakness of garbage collection implemented using only reference
844
counting. Reference cycles consist of objects which contain (possibly indirect)
845
references to themselves, so that each object in the cycle has a reference count
846
which is non-zero. Typical reference counting implementations are not able to
847
reclaim the memory belonging to any objects in a reference cycle, or referenced
848
from the objects in the cycle, even though there are no further references to
851
The cycle detector is able to detect garbage cycles and can reclaim them so long
852
as there are no finalizers implemented in Python (:meth:`__del__` methods).
853
When there are such finalizers, the detector exposes the cycles through the
854
:mod:`gc` module (specifically, the :attr:`~gc.garbage` variable in that module).
855
The :mod:`gc` module also exposes a way to run the detector (the
856
:func:`~gc.collect` function), as well as configuration
857
interfaces and the ability to disable the detector at runtime. The cycle
858
detector is considered an optional component; though it is included by default,
859
it can be disabled at build time using the :option:`--without-cycle-gc` option
860
to the :program:`configure` script on Unix platforms (including Mac OS X) or by
861
removing the definition of ``WITH_CYCLE_GC`` in the :file:`pyconfig.h` header on
862
other platforms. If the cycle detector is disabled in this way, the :mod:`gc`
863
module will not be available.
866
.. _refcountsinpython:
868
Reference Counting in Python
869
----------------------------
871
There are two macros, ``Py_INCREF(x)`` and ``Py_DECREF(x)``, which handle the
872
incrementing and decrementing of the reference count. :c:func:`Py_DECREF` also
873
frees the object when the count reaches zero. For flexibility, it doesn't call
874
:c:func:`free` directly --- rather, it makes a call through a function pointer in
875
the object's :dfn:`type object`. For this purpose (and others), every object
876
also contains a pointer to its type object.
878
The big question now remains: when to use ``Py_INCREF(x)`` and ``Py_DECREF(x)``?
879
Let's first introduce some terms. Nobody "owns" an object; however, you can
880
:dfn:`own a reference` to an object. An object's reference count is now defined
881
as the number of owned references to it. The owner of a reference is
882
responsible for calling :c:func:`Py_DECREF` when the reference is no longer
883
needed. Ownership of a reference can be transferred. There are three ways to
884
dispose of an owned reference: pass it on, store it, or call :c:func:`Py_DECREF`.
885
Forgetting to dispose of an owned reference creates a memory leak.
887
It is also possible to :dfn:`borrow` [#]_ a reference to an object. The
888
borrower of a reference should not call :c:func:`Py_DECREF`. The borrower must
889
not hold on to the object longer than the owner from which it was borrowed.
890
Using a borrowed reference after the owner has disposed of it risks using freed
891
memory and should be avoided completely. [#]_
893
The advantage of borrowing over owning a reference is that you don't need to
894
take care of disposing of the reference on all possible paths through the code
895
--- in other words, with a borrowed reference you don't run the risk of leaking
896
when a premature exit is taken. The disadvantage of borrowing over owning is
897
that there are some subtle situations where in seemingly correct code a borrowed
898
reference can be used after the owner from which it was borrowed has in fact
901
A borrowed reference can be changed into an owned reference by calling
902
:c:func:`Py_INCREF`. This does not affect the status of the owner from which the
903
reference was borrowed --- it creates a new owned reference, and gives full
904
owner responsibilities (the new owner must dispose of the reference properly, as
905
well as the previous owner).
913
Whenever an object reference is passed into or out of a function, it is part of
914
the function's interface specification whether ownership is transferred with the
917
Most functions that return a reference to an object pass on ownership with the
918
reference. In particular, all functions whose function it is to create a new
919
object, such as :c:func:`PyInt_FromLong` and :c:func:`Py_BuildValue`, pass
920
ownership to the receiver. Even if the object is not actually new, you still
921
receive ownership of a new reference to that object. For instance,
922
:c:func:`PyInt_FromLong` maintains a cache of popular values and can return a
923
reference to a cached item.
925
Many functions that extract objects from other objects also transfer ownership
926
with the reference, for instance :c:func:`PyObject_GetAttrString`. The picture
927
is less clear, here, however, since a few common routines are exceptions:
928
:c:func:`PyTuple_GetItem`, :c:func:`PyList_GetItem`, :c:func:`PyDict_GetItem`, and
929
:c:func:`PyDict_GetItemString` all return references that you borrow from the
930
tuple, list or dictionary.
932
The function :c:func:`PyImport_AddModule` also returns a borrowed reference, even
933
though it may actually create the object it returns: this is possible because an
934
owned reference to the object is stored in ``sys.modules``.
936
When you pass an object reference into another function, in general, the
937
function borrows the reference from you --- if it needs to store it, it will use
938
:c:func:`Py_INCREF` to become an independent owner. There are exactly two
939
important exceptions to this rule: :c:func:`PyTuple_SetItem` and
940
:c:func:`PyList_SetItem`. These functions take over ownership of the item passed
941
to them --- even if they fail! (Note that :c:func:`PyDict_SetItem` and friends
942
don't take over ownership --- they are "normal.")
944
When a C function is called from Python, it borrows references to its arguments
945
from the caller. The caller owns a reference to the object, so the borrowed
946
reference's lifetime is guaranteed until the function returns. Only when such a
947
borrowed reference must be stored or passed on, it must be turned into an owned
948
reference by calling :c:func:`Py_INCREF`.
950
The object reference returned from a C function that is called from Python must
951
be an owned reference --- ownership is transferred from the function to its
960
There are a few situations where seemingly harmless use of a borrowed reference
961
can lead to problems. These all have to do with implicit invocations of the
962
interpreter, which can cause the owner of a reference to dispose of it.
964
The first and most important case to know about is using :c:func:`Py_DECREF` on
965
an unrelated object while borrowing a reference to a list item. For instance::
970
PyObject *item = PyList_GetItem(list, 0);
972
PyList_SetItem(list, 1, PyInt_FromLong(0L));
973
PyObject_Print(item, stdout, 0); /* BUG! */
976
This function first borrows a reference to ``list[0]``, then replaces
977
``list[1]`` with the value ``0``, and finally prints the borrowed reference.
978
Looks harmless, right? But it's not!
980
Let's follow the control flow into :c:func:`PyList_SetItem`. The list owns
981
references to all its items, so when item 1 is replaced, it has to dispose of
982
the original item 1. Now let's suppose the original item 1 was an instance of a
983
user-defined class, and let's further suppose that the class defined a
984
:meth:`__del__` method. If this class instance has a reference count of 1,
985
disposing of it will call its :meth:`__del__` method.
987
Since it is written in Python, the :meth:`__del__` method can execute arbitrary
988
Python code. Could it perhaps do something to invalidate the reference to
989
``item`` in :c:func:`bug`? You bet! Assuming that the list passed into
990
:c:func:`bug` is accessible to the :meth:`__del__` method, it could execute a
991
statement to the effect of ``del list[0]``, and assuming this was the last
992
reference to that object, it would free the memory associated with it, thereby
993
invalidating ``item``.
995
The solution, once you know the source of the problem, is easy: temporarily
996
increment the reference count. The correct version of the function reads::
999
no_bug(PyObject *list)
1001
PyObject *item = PyList_GetItem(list, 0);
1004
PyList_SetItem(list, 1, PyInt_FromLong(0L));
1005
PyObject_Print(item, stdout, 0);
1009
This is a true story. An older version of Python contained variants of this bug
1010
and someone spent a considerable amount of time in a C debugger to figure out
1011
why his :meth:`__del__` methods would fail...
1013
The second case of problems with a borrowed reference is a variant involving
1014
threads. Normally, multiple threads in the Python interpreter can't get in each
1015
other's way, because there is a global lock protecting Python's entire object
1016
space. However, it is possible to temporarily release this lock using the macro
1017
:c:macro:`Py_BEGIN_ALLOW_THREADS`, and to re-acquire it using
1018
:c:macro:`Py_END_ALLOW_THREADS`. This is common around blocking I/O calls, to
1019
let other threads use the processor while waiting for the I/O to complete.
1020
Obviously, the following function has the same problem as the previous one::
1025
PyObject *item = PyList_GetItem(list, 0);
1026
Py_BEGIN_ALLOW_THREADS
1027
...some blocking I/O call...
1028
Py_END_ALLOW_THREADS
1029
PyObject_Print(item, stdout, 0); /* BUG! */
1038
In general, functions that take object references as arguments do not expect you
1039
to pass them *NULL* pointers, and will dump core (or cause later core dumps) if
1040
you do so. Functions that return object references generally return *NULL* only
1041
to indicate that an exception occurred. The reason for not testing for *NULL*
1042
arguments is that functions often pass the objects they receive on to other
1043
function --- if each function were to test for *NULL*, there would be a lot of
1044
redundant tests and the code would run more slowly.
1046
It is better to test for *NULL* only at the "source:" when a pointer that may be
1047
*NULL* is received, for example, from :c:func:`malloc` or from a function that
1048
may raise an exception.
1050
The macros :c:func:`Py_INCREF` and :c:func:`Py_DECREF` do not check for *NULL*
1051
pointers --- however, their variants :c:func:`Py_XINCREF` and :c:func:`Py_XDECREF`
1054
The macros for checking for a particular object type (``Pytype_Check()``) don't
1055
check for *NULL* pointers --- again, there is much code that calls several of
1056
these in a row to test an object against various different expected types, and
1057
this would generate redundant tests. There are no variants with *NULL*
1060
The C function calling mechanism guarantees that the argument list passed to C
1061
functions (``args`` in the examples) is never *NULL* --- in fact it guarantees
1062
that it is always a tuple. [#]_
1064
It is a severe error to ever let a *NULL* pointer "escape" to the Python user.
1067
A pedagogically buggy example, along the lines of the previous listing, would
1068
be helpful here -- showing in more concrete terms what sort of actions could
1069
cause the problem. I can't very well imagine it from the description.
1074
Writing Extensions in C++
1075
=========================
1077
It is possible to write extension modules in C++. Some restrictions apply. If
1078
the main program (the Python interpreter) is compiled and linked by the C
1079
compiler, global or static objects with constructors cannot be used. This is
1080
not a problem if the main program is linked by the C++ compiler. Functions that
1081
will be called by the Python interpreter (in particular, module initialization
1082
functions) have to be declared using ``extern "C"``. It is unnecessary to
1083
enclose the Python header files in ``extern "C" {...}`` --- they use this form
1084
already if the symbol ``__cplusplus`` is defined (all recent C++ compilers
1085
define this symbol).
1090
Providing a C API for an Extension Module
1091
=========================================
1093
.. sectionauthor:: Konrad Hinsen <hinsen@cnrs-orleans.fr>
1096
Many extension modules just provide new functions and types to be used from
1097
Python, but sometimes the code in an extension module can be useful for other
1098
extension modules. For example, an extension module could implement a type
1099
"collection" which works like lists without order. Just like the standard Python
1100
list type has a C API which permits extension modules to create and manipulate
1101
lists, this new collection type should have a set of C functions for direct
1102
manipulation from other extension modules.
1104
At first sight this seems easy: just write the functions (without declaring them
1105
``static``, of course), provide an appropriate header file, and document
1106
the C API. And in fact this would work if all extension modules were always
1107
linked statically with the Python interpreter. When modules are used as shared
1108
libraries, however, the symbols defined in one module may not be visible to
1109
another module. The details of visibility depend on the operating system; some
1110
systems use one global namespace for the Python interpreter and all extension
1111
modules (Windows, for example), whereas others require an explicit list of
1112
imported symbols at module link time (AIX is one example), or offer a choice of
1113
different strategies (most Unices). And even if symbols are globally visible,
1114
the module whose functions one wishes to call might not have been loaded yet!
1116
Portability therefore requires not to make any assumptions about symbol
1117
visibility. This means that all symbols in extension modules should be declared
1118
``static``, except for the module's initialization function, in order to
1119
avoid name clashes with other extension modules (as discussed in section
1120
:ref:`methodtable`). And it means that symbols that *should* be accessible from
1121
other extension modules must be exported in a different way.
1123
Python provides a special mechanism to pass C-level information (pointers) from
1124
one extension module to another one: Capsules. A Capsule is a Python data type
1125
which stores a pointer (:c:type:`void \*`). Capsules can only be created and
1126
accessed via their C API, but they can be passed around like any other Python
1127
object. In particular, they can be assigned to a name in an extension module's
1128
namespace. Other extension modules can then import this module, retrieve the
1129
value of this name, and then retrieve the pointer from the Capsule.
1131
There are many ways in which Capsules can be used to export the C API of an
1132
extension module. Each function could get its own Capsule, or all C API pointers
1133
could be stored in an array whose address is published in a Capsule. And the
1134
various tasks of storing and retrieving the pointers can be distributed in
1135
different ways between the module providing the code and the client modules.
1137
Whichever method you choose, it's important to name your Capsules properly.
1138
The function :c:func:`PyCapsule_New` takes a name parameter
1139
(:c:type:`const char \*`); you're permitted to pass in a *NULL* name, but
1140
we strongly encourage you to specify a name. Properly named Capsules provide
1141
a degree of runtime type-safety; there is no feasible way to tell one unnamed
1142
Capsule from another.
1144
In particular, Capsules used to expose C APIs should be given a name following
1147
modulename.attributename
1149
The convenience function :c:func:`PyCapsule_Import` makes it easy to
1150
load a C API provided via a Capsule, but only if the Capsule's name
1151
matches this convention. This behavior gives C API users a high degree
1152
of certainty that the Capsule they load contains the correct C API.
1154
The following example demonstrates an approach that puts most of the burden on
1155
the writer of the exporting module, which is appropriate for commonly used
1156
library modules. It stores all C API pointers (just one in the example!) in an
1157
array of :c:type:`void` pointers which becomes the value of a Capsule. The header
1158
file corresponding to the module provides a macro that takes care of importing
1159
the module and retrieving its C API pointers; client modules only have to call
1160
this macro before accessing the C API.
1162
The exporting module is a modification of the :mod:`spam` module from section
1163
:ref:`extending-simpleexample`. The function :func:`spam.system` does not call
1164
the C library function :c:func:`system` directly, but a function
1165
:c:func:`PySpam_System`, which would of course do something more complicated in
1166
reality (such as adding "spam" to every command). This function
1167
:c:func:`PySpam_System` is also exported to other extension modules.
1169
The function :c:func:`PySpam_System` is a plain C function, declared
1170
``static`` like everything else::
1173
PySpam_System(const char *command)
1175
return system(command);
1178
The function :c:func:`spam_system` is modified in a trivial way::
1181
spam_system(PyObject *self, PyObject *args)
1183
const char *command;
1186
if (!PyArg_ParseTuple(args, "s", &command))
1188
sts = PySpam_System(command);
1189
return Py_BuildValue("i", sts);
1192
In the beginning of the module, right after the line ::
1196
two more lines must be added::
1199
#include "spammodule.h"
1201
The ``#define`` is used to tell the header file that it is being included in the
1202
exporting module, not a client module. Finally, the module's initialization
1203
function must take care of initializing the C API pointer array::
1209
static void *PySpam_API[PySpam_API_pointers];
1210
PyObject *c_api_object;
1212
m = Py_InitModule("spam", SpamMethods);
1216
/* Initialize the C API pointer array */
1217
PySpam_API[PySpam_System_NUM] = (void *)PySpam_System;
1219
/* Create a Capsule containing the API pointer array's address */
1220
c_api_object = PyCapsule_New((void *)PySpam_API, "spam._C_API", NULL);
1222
if (c_api_object != NULL)
1223
PyModule_AddObject(m, "_C_API", c_api_object);
1226
Note that ``PySpam_API`` is declared ``static``; otherwise the pointer
1227
array would disappear when :func:`initspam` terminates!
1229
The bulk of the work is in the header file :file:`spammodule.h`, which looks
1232
#ifndef Py_SPAMMODULE_H
1233
#define Py_SPAMMODULE_H
1238
/* Header file for spammodule */
1240
/* C API functions */
1241
#define PySpam_System_NUM 0
1242
#define PySpam_System_RETURN int
1243
#define PySpam_System_PROTO (const char *command)
1245
/* Total number of C API pointers */
1246
#define PySpam_API_pointers 1
1250
/* This section is used when compiling spammodule.c */
1252
static PySpam_System_RETURN PySpam_System PySpam_System_PROTO;
1255
/* This section is used in modules that use spammodule's API */
1257
static void **PySpam_API;
1259
#define PySpam_System \
1260
(*(PySpam_System_RETURN (*)PySpam_System_PROTO) PySpam_API[PySpam_System_NUM])
1262
/* Return -1 on error, 0 on success.
1263
* PyCapsule_Import will set an exception if there's an error.
1268
PySpam_API = (void **)PyCapsule_Import("spam._C_API", 0);
1269
return (PySpam_API != NULL) ? 0 : -1;
1278
#endif /* !defined(Py_SPAMMODULE_H) */
1280
All that a client module must do in order to have access to the function
1281
:c:func:`PySpam_System` is to call the function (or rather macro)
1282
:c:func:`import_spam` in its initialization function::
1289
m = Py_InitModule("client", ClientMethods);
1292
if (import_spam() < 0)
1294
/* additional initialization can happen here */
1297
The main disadvantage of this approach is that the file :file:`spammodule.h` is
1298
rather complicated. However, the basic structure is the same for each function
1299
that is exported, so it has to be learned only once.
1301
Finally it should be mentioned that Capsules offer additional functionality,
1302
which is especially useful for memory allocation and deallocation of the pointer
1303
stored in a Capsule. The details are described in the Python/C API Reference
1304
Manual in the section :ref:`capsules` and in the implementation of Capsules (files
1305
:file:`Include/pycapsule.h` and :file:`Objects/pycapsule.c` in the Python source
1308
.. rubric:: Footnotes
1310
.. [#] An interface for this function already exists in the standard module :mod:`os`
1311
--- it was chosen as a simple and straightforward example.
1313
.. [#] The metaphor of "borrowing" a reference is not completely correct: the owner
1314
still has a copy of the reference.
1316
.. [#] Checking that the reference count is at least 1 **does not work** --- the
1317
reference count itself could be in freed memory and may thus be reused for
1320
.. [#] These guarantees don't hold when you use the "old" style calling convention ---
1321
this is still found in much existing code.