~mwhudson/pypy/imported-pypy-rdict-refactoring

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
====================
Bootstrapping Ctypes
====================

.. contents::
.. sectnum::


Abstract and Motivation
=======================

Bootstrapping Ctypes on PYPY is a 4 steps procedure.

    1) Implement a restricted version of ctypes called rctypes 
       at interpreter level.

    2) Wrap the libffi provided by CPython's ctypes in a manner
       that is compatible with CPython's ctypes using the existing
       ctypes CPython. This means wrapping libffi with libffi.

    3) Port the result of 2 to interpreter level by applying the
       neccessary changes to make it rctypes compatible.

    4) Refine ctypes and rctypes by adding callbacks, free unions and
       other stuff needed. 

The most appealing features of this approach are

    -   the ability to implement step 1 and step 2 in parallel.

    -   and the ability to wrap urgently needed modules, like the socket module
        using the existing ctypes on CPython. Again this work
        can be done in parallel.

Of course all existing modules that wrap external libraries using ctypes
will also be available for PyPy.

Design
======

Restrictions
------------

Rctypes is desinged with the following restrictions.

    -   All types are defined at module load time and
        thus need not be rpython.

    -   Free unions are not supported, because it is unclear
        whether they can be properly annotated.

    -   Callbacks are deferred to steps 4 and 5.

    -   Functions that return structures and pointers with a mixed
        allocation model are not supported in the initial rctypes version.

    -   Ctypes custom allocators are not supported in the first 4 steps.
    

Annotation
----------

Ctypes on CPython tracks all the memory it has allocated by itself,
may it be referenced by pointers and structures or only pointers.
Thus memory allocated by ctypes is properly garbage collected and no
dangling pointers should arise.

Pointers to structures returned by an external function or passed
to a callback are a different story. For such pointers we have to assume
that were not be allocated by ctypes, even if they were actually allocated
by ctypes.

Thus the annotator tracks the memory state of each ctypes object. Pointers
to structures are annotated differently when they are return by an external
function. As stated above a mixed memory mode function result type is not
considered rctypes compliant and therefore annotated as `SomeObject` [#]_.

.. [#] This restriction will be lifted in future ctypes versions. 

Memory-Layout
-------------

In Ctypes, all instances are mutable boxes containing either some raw
memory with a layout compatible to that of the equivalent C type, or a
reference to such memory.  The reference indirection is transparent to
the user; for example, dereferencing a ctypes object "pointer to
structure" results in a "structure" object that doesn't include a copy
of the data, but only a reference to that data.  (This is similar to the
C++ notion of reference: it is just a pointer at the machine level, but
at the language level it behaves like the object that it points to, not
like a pointer.)

We map this to the LLType model as follows.  For boxes that embed the
raw memory content::

    Ptr( GcStruct( "name",
           ("c_data", Struct(...) ) ) )

where the raw memory content and layout is specified by the
"Struct(...)" part.

For boxes that don't embed the raw memory content::

    Ptr( GcStruct( "name",
           ("c_data_ref", Ptr(Struct(...)) ) ) )

In both cases, the outer GcStruct is needed to make the boxes tracked by
the GC automatically.  The "c_data" or "c_data_ref" field either embeds
or references the raw memory; the "Struct(...)" definition specifies the
exact C layout expected for that memory.

Of course, the "c_data" and "c_data_ref" fields are not visible to the
rpython-level user.  This is where an rctype-specific restriction comes
from: it must be possible for the annotator to figure out statically for
each variable if it needs to be implemented with a "c_data" or a
"c_data_ref" field.  (The annotation SomeCTypesObject contains a
memorystate field which can be OWNSMEMORY ("c_data" case) or MEMORYALIAS
("c_data_ref" case).)

Primitive Types
~~~~~~~~~~~~~~~
Ctypes' primitive types are mapped directly to the correspondending PyPy
LLType: Signed, Float, etc.  For the owned-memory case, we get::

    Ptr( GcStruct( "CtypesBox_<TypeName>
            ( "c_data"
                    (Struct "C_Data_<TypeName>
                            ( "value", Signed/Float/etc. ) ) ) ) )

Note that we don't make "c_data" itself a Signed or Float directly because
in LLType we can't take pointers to Signed or Float, only to Struct or
Array.

The non-owned-memory case is::

    Ptr( GcStruct( "CtypesBox_<TypeName>
            ( "c_data_ref"
                    (Ptr(Struct "C_Data_<TypeName>
                            ( "value", Signed/Float/etc. ) ) ) ) ) )

Pointers
~~~~~~~~

::

    Ptr( GcStruct( "CtypesBox_<TypeName>
            ( "c_data"
                    (Struct "C_Data_<TypeName>
                            ( "value", Ptr(...) ) ) ) ) )

or::

    Ptr( GcStruct( "CtypesBox_<TypeName>
            ( "c_data_ref"
                    (Ptr(Struct "C_Data_<TypeName>
                            ( "value", Ptr(...) ) ) ) ) ) )

However, there is a special case here: the pointer might point to data
owned by another CtypesBox -- i.e. it can point to the "c_data" field of
some other CtypesBox.  In this case we must make sure that the other
CtypesBox stays alive.  This is done by adding an extra field
referencing the gc box (this field is not otherwise used)::

    Ptr( GcStruct( "CtypesBox_<TypeName>
            ( "c_data"
                    (Struct "C_Data_<TypeName>
                            ( "value", Ptr(...) ) ) )
            ( "keepalive"
                    (Ptr(GcStruct("CtypesBox_<TargetTypeName>"))) ) ) )

Structures
~~~~~~~~~~
Structures will have the following memory layout (owning their raw memory)
if they were allocated by ctypes::

    Ptr( GcStruct( "CtypesBox_<StructName>
            ( "c_data" 
                    (Struct "C_Data_<StructName>
                            *<Fieldefintions>) ) ) )

For structures obtained by dereferencing a pointer (by reading its
"contents" attribute), the structure box does not own the memory::

    Ptr( GcStruct( "CtypesBox_<StructName>
            ( "c_data_ref" 
                    (Ptr(Struct "C_Data_<StructName>
                            *<Fieldefintions>) ) ) ) )

One or several Keepalive fields might be necessary in each case.
(To be clarified...)

Arrays
~~~~~~
Arrays behave like structures, but use an Array instead of a Struct in
the "c_data" or "c_data_ref" declaration.