~mwhudson/pypy/imported-pypy-rdict-refactoring

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
===================================================================================================
Ideas about syntactic and algorithmic aspects of Constraint and Logic Programming in Python (CLPiP)
===================================================================================================

** WORK IN PROGRESS **

Introduction
============

Scope
-----

This document tries to shed some light on integration of logic and
constraint programming into Python using the PyPy
framework.

PyPy has been progressively equiped with a parser and compiler
flexible enough that it is hoped that developpers can leverage it to
extend the language *at runtime*. This is quite in the spirit of Lisp
macros, if not the exact manner.

This takes place in Working Packages 09 and 10 of the EU PyPy funding
project. It is expected that an aspect oriented programming toolkit be
built using the compiler and parser infrastructure (WP10), and that the
logic and constraint programming features be added to PyPy (WP9). An
ontology library will be provided and will serve as our first use
case for logic programming.

Clarifications
--------------

This work was described as integration of logic programming *and*
constraint programming into PyPy. Both are obviously related but we
don't know if they will share, in PyPy, the same syntax and the same
machinery.

Also it has been said that CLP will be inspired from what is done in
the Oz programming language (http://www.mozart-oz.org/), which
provides different programming 'paradigms' with a very well
thought-out integration between said paradigms. For instance,
imperative features can be mixed, to some extent, with logic
programming (the later can be 'embedded' in the former); or constraint
programming goes with stateless concurrency, another important feature
of Oz. We still don't know to what extent the Oz level of integration
will be pursued in the context of PyPy, esp. WP 9 and 10.

Lastly, here we mainly discuss syntactical issues: those are probably
the least difficult aspects of getting CLP into python; getting an
efficient implementation of the canonical algorithms into PyPy will be
the bulk of the work.

Constraint programming
======================

In constraint programming, a 'problem' is a set of variables, their
(finite discrete) domains, and the constraints that restrict their
possible values. When all these have been given to a constraint
solver, it is possible to find all possible solutions, that is the
sets of valuations that satisfies simultaneously all constraints. The
solver is solely responsible for finding solutions (or lack thereof).

Overview in python
------------------

At the time being, there exists a *constraints* package made by
Logilab and written in pure python, which implements the solver found
in Mozart (the reference Oz implementation). We use it to illustrate
where we want to go, syntactically-wise.

Let's start with a quite standard example (the problem being solved
here is fully described on
http://www.logilab.org/projects/constraint/documentation)::


 # import Repository class and fd module, 
 from logilab.constraint import *
 variables = ('c01','c02','c03','c04','c05','c06','c07','c08','c09','c10')

Variables are represented as any string object::

 values = [(room,slot) 
           for room in ('room A','room B','room C') 
           for slot in ('day 1 AM','day 1 PM','day 2 AM','day 2 PM')]

Values can be freely pre-computed using standard python constructs;
they can be any object; here, tuples of strings::

 domains = {}
 for v in variables:
     domains[v]=fd.FiniteDomain(values)

The relationship between variables and their possible values is set in
a dictionnary whose keys are variable designators (strings). Values
are wrapped into FiniteDomain instances (FiniteDomain has set
behaviour, plus some implementation subtleties)::

 groups = (('c01','c02','c03','c10'),
           ('c02','c06','c08','c09'),
           ('c03','c05','c06','c07'),
           ('c01','c03','c07','c08'))
 for g in groups:
     for conf1 in g:
         for conf2 in g:
             if conf2 > conf1:
                 constraints.append(fd.make_expression((conf1,conf2),
                                                       '%s[1] != %s[1]'%\
                                                         (conf1,conf2)))

Constraints are built by make_expression which takes a tuple of one or
two variables and a string representing an unary or binary
relationship. The example complete with all constraints is provided at
the url mentioned supra.

Then, when everything has been settled, comes the last step::

 r = Repository(variables,domains,constraints)
 solutions = Solver().solve(r)
 print solutions

Remarks
-------

Due to the compactness of Python syntax, this sample problem
specification remains quite small and readable. It is not obvious what
could be done to make it smaller and still readable.

Variables are not first-class (but close ...) and have nothing to do
with Python standard variables. The good side of this is that we can't
misuse a CSP variable with an ordinary variable.

Specifiying a constraint is clunky : variables and operator have to be
provided separately, and the operator has to be a string. This last
restriction because Python doesn't allow passing builtin (infix)
operators as functional parameters.

Special operator for CLP variable bindings
------------------------------------------

First, promote variables from third-class to second-class
citizenry. Be able to write something like::

 domain = [(room,slot) 
           for room in ('room A','room B','room C') 
           for slot in ('day 1 AM','day 1 PM','day 2 AM','day 2 PM')]
 c01 := domain
 c02 := domain

This introduces a special operator ``:=`` which binds a logical variable
to a domain. More generally::

 var := <any iterable>

With respect to normal assignment, we can imagine the following::

 c01 = 'foo' # raises a NotAssignable or ReadOnly exception
 bar = c01   # takes a reference to the current value of c01 into bar
             # also, meaningless (so ... None) before the solver has run

Problem ... we can't anymore do::

 for conf in ('c01','c05','c10'): ...

It should be good to define a kind of first-class designator for these
kind of variables. A specially-crafted class representing variables
(in the manner of Lisp's symbols) would suffice::

 for conf in (c01, c05, c10): ...

Is it worth the price ? Quite unsure. 

Domain-specific blocks
----------------------

An alternative which avoids the special operator and uses a keyword
instead could be::

 domain:
    c01 = <iterable>
    c02 = <iterable>

It makes us reuse ``=``, with twisted (non-standard Python) semantics
but under a clear lexical umbrella (a ``domain:`` block).

It is possible to get further in this direction::

 problem toto:
     D1 = <domain definition>
     a,b,c in D1
     def constraint1(a,b,c):
         a == b
     
 for sol in toto:
     print sol

There, we put a full constraints mini-language under a named 'problem'
block. The *problem* becomes a first class object (in the manner of
Python classes) and we can (lazily) extract solutions from it.


Stuffing constraints
--------------------

The ugly aspect of py-constraints is the definition of custom
unary/binary constraints through make_expression, as in::

 fd.make_expression ('var1', 'var2', "frob(var1,var2)")

One solution might be to parse the string at runtime to recover the
variable names::

 fd.make_expression ('frob(var1,var2)')

A simple hand-written parser could be sufficient for this.  On the
other hand, the lexically bounded mini-language proposed above helps
solve this more uniformly.

Logic Programming, Prolog-style
===============================

Good old logic programming has an already long syntactic tradition; we
should probably not try to evade it too much. Thus we propose again a
mini-language embedding horn-clauses and logic stuff into 'normal'
(unsuspecting) Python programs.

Existing python/prolog projects can be found at :
http://christophe.delord.free.fr/en/pylog/ and
http://aima.cs.berkeley.edu/python/logic.html. There might be more of
them.


Databases and Requests
----------------------

Databases contain facts, rules and requests::

 >>> database Numbers: # we def a logic db and enter prolog space
 ...    natural(z)
 ...    natural(s(X)) if natural(X)
 ...    r1: natural(s(s(s(z)))) # kind of named request
 ...
 >>> Numbers 
 <LogicDB object at ...>
 >>> Numbers = 42 
 NotAssignable exception, etc.


Integration in Python
---------------------

::

 # databases as first class objects
 get_logic_databases() # -> [Numbers]

 # return values in Python space
 d1 = get_logic_databases()[0]
 d1.ask('r1') -> True

 # extending an existing database
 database Numbers: # adds up to existing Numbers db (eyebrows raise)
     pair(z)
     pair(s(s(X))) if pair(X)

 # database acquisition (inheritance ?)
 database Ratios(Numbers):
     pass

 # adding rules and requests from Python space
 d2 = get_logic_databases[1] # the Ratios db
 d2.add_rule('ratio(X,Y) if Y != z, natural(X), natural(Y)')
 
 d2.add_request('r2', 'ratio(s(z),toto)')
 d2.ask('r2') -> False


Thus, the logic and Python stuff are cleanly separated, and of course
we forbid lexical capture as in::

 z = 42
 database Numbers:
     natural(z)

A (maybe) nicer way to ask could be::

 database Family :

     father(jean, pierre)
     father(paul, jean)
     grandfather(X,Y) if father(X,Z) and father(Z,Y)
    
 Family.grandfather(jean, A)


Conclusion
----------

Syntax is not all-important but can touch on the nerves of some. It
sometimes involves relogious arguments. We hope to irritate none with
this proposal; we merely expect counter-proposals or more detailed
proposals or agreement or just indiference on this.


CLP and Oz
==========

The Oz programming language is and will be a driving inspiration for
all deep integration work of CLP. Logilab constraints pure python
package already implements search according to Oz (without the
concurrency parts).

(don't forget to have a serious look at Screamer/CL)

What shall be done 
------------------

For constraint programming:

* Adapt the core algorithms from logilab.constraints to RPython
* Stuff it into PyPy
* Enhance (tm) it with concurrency aspects (needs preemptive
  threading)

Logic programming:

* Investigate how much of above can be reused
* Pick a strategy (reuse constraints or implement unification)
* Implement deep into PyPy

For both:

* Adapt the parser/lexer to the syntactic requirements


Ideas of testbeds
=================

try to use logic related features in all of these python projects to
make sure of the design and implementation of logic in pypy.

- rdflib.sparql an implementation of W3C's SPARQL language coupled
  with redfoot.
  http://rdflib.net/2005/09/10/rdflib-2.2.2/html/public/rdflib.sparql-module.html

- cwm a rule-based engine for rdf inferencing
  http://www.w3.org/2000/10/swap/doc/cwm.html

- pychinko an implementation of RETE algorithm for rule-based systems
  http://www.mindswap.org/~katz/pychinko/
  http://en.wikipedia.org/wiki/Rete_algorithm

- pylint: using constraints to generate design-related comments and
  warnings is done in SOUL or doable with pyontology
  http://www.logilab.org/projects/pylint

- pyontology. DFKI's OWL manipulation tool (see svn)