~unifield-team/unifield-web/UTP-252

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
==========================================
OpenERP Web Client packaging peculiarities
==========================================

--------------------------------------------------
Duplication of efforts in setup.py and Manifest.in
--------------------------------------------------

As you may note in reading the packaging files of the Web Client,
there are severe redundancies between sections of the ``setup.py``
file and the ``MANIFEST.in`` file.

These are due to the very variable behavior of ``distutils`` commands
in the face of these data:

* ``distutils.core.setup`` has a ``data_files`` keyword argument whose
  purpose is the listing of the non-code files to distribute, and the
  location where they should be put in the produced package.

  * ``setup.py install`` uses this list to know which files to carry
    over
  * py2exe uses this as well to determine which files it should
    move into the ``dist_dir``, directly in that directory
  * but ``setup.py sdist`` completely ignores ``data_files`` and
    instead relies on the content of the ``MANIFEST.in`` file to know
    what to port where
  * *unless* it is used on a Python 2.7 installation *and* there is no
     ``MANIFEST.in`` file (at all). In that case, ``sdist`` will
     respect ``data_files``

  Because the web client supports Python 2.5 up, the latter is not an
  option. Creating a custom parser & translator (or trying to find out
  where it lives in the distribution) for ``MANIFEST.in`` was deemed
  non-viable at this point, and thus ``find_data_file`` was born (a
  very simple function able to find and filter files in directories
  and return them in the format ``data_files`` expects)

* ``distutils.core.setup`` also has a ``scripts`` keyword argument
  which is a list of scripts to install. Because py2exe can generate a
  number of executable target types, it ignores ``scripts`` and
  instead wants a ``console`` kwarg for the same purpose.

* ``distutils.core.setup`` has a ``package_data`` keyword
  argument. This is used to list non-code files within Python (code)
  packages, using a mapping from python packages (dotted notation) to
  a list of globbing paths.

  * ``setup.py install`` uses this specification to bundle non-code
    files in the package it installs into your site-package
  * ``sdist`` behaves very similarly to how it behaves when faced with
    ``data_files``: it completely ignores the specification and
    requires a ``MANIFEST.in`` file, *unless* you're using Python 2.7
    and *do not* have a ``MANIFEST.in`` template at all.
  * py2exe entirely ignores ``package_data`` and doesn't include any
    of the specified files in its output, neither as a toplevel file
    tree nor as part of the output archive.

    This resulted in the ``PackageDataCollector`` class in
    ``py2exe_utils`` which extends ``py2exe`` to bundle
    ``package_data`` files into the archive produced by py2exe.

----------
Data Files
----------

The Web Client not only has a number of purely data files (e.g. this
documentation), but for packaging purposes it treats its bundled
addons (those which are not distributed as part of an installed
server-side module) as data files as well.

Among other things, this means the web client addons sit in a
directory outside the archive generated by py2exe.

---------------------------------
pkgutil, pkg_resources and py2exe
---------------------------------

As described above, the Web Client holds a number of data files within
as well as outside of its core ``openobject`` package.

Python's stdlib provides a restricted tool to manage those files in
the form of `pkgutil.get_data(package, path)`_. Sadly
this function is limited in scope in that it only works within
packages, and its interface is badly suited towards existence tests.

setuptools/distutils builds upon ``get_data`` with a number of
`resource access methods
<http://packages.python.org/distribute/pkg_resources.html#basic-resource-access>`_
providing a richer interface to resources management. There is also an
added capability of being able to work with complete distutils
*distributions*, not just python packages, and thus of being able to
access top-level resources outside any Python package structure (such
as this very file).

Sadly, while this works correctly with ``install`` or ``sdist``, *it
utterly fails with py2exe* because py2exe apparently doesn't
store/bundle/know of distribution information/data. At this point I
haven't investigated this failure any further. The issue simply
remains (for now) that distributions-based requirements (and thus
resources management) is not an option.

This includes `using pkg_resources.require to assert that all
necessary distributions are available and correctly installed
<http://packages.python.org/distribute/pkg_resources.html#basic-workingset-methods>`_.

--------------------------------------------------------------------------------
Package data and third-party libraries, because things worked to well previously
--------------------------------------------------------------------------------

As hinted-at in `duplication of efforts in setup.py and Manifest.in`_,
the data files within the ``openobject`` package stay stored
there. Because they are collected by py2exe, this also means they get
bundled as part of py2exe's zip archive.

Considering `get_data`_ this ought not be an issue, but some of the
libraries used by the web client plain and simply assume everything
lives on a trivially accessible file system (or seem to, from my
reading/understanding of them so far). Chief among them are `Mako`_
[#]_ and `CherryPy`_ [#]_. Having static files bundled within the
archive also doesn't work very well for users wanting to:

* Customize the templates for CDN serving of static files

* Serve static files via a dedicated web server tuned for static
  content (`lighttpd <http://www.lighttpd.net/>`_,
  `nginx <http://wiki.nginx.org/>`_,
  `cherokee <http://www.cherokee-project.com/>`_ or a stripped
  `Apache <http://httpd.apache.org/>`_))

---------------------------------------------------------------------------------
Third-party libraries and their own package data, because why would it just work?
---------------------------------------------------------------------------------

The issue of making our own package data available to dependencies
having been sidestepped, we are faced with an other issue: some
dependencies bundle their own package data, which not only py2exe
doesn't pick by default but which they themselves can not retrieve
from a package as they're using direct filesystem manipulation tools
to do their data-queries.

Babel
=====

Babel is "a collection of tools for internationalizing Python
applications". The web client uses it for pretty much all of its i18n
including formatting of dates or numbers, and gettext-ing.

The core issue with babel is that it carries with itself a number of
data files used for locale-specific formatting (among other
things). It accesses these data directly by looking for a given
directory next to one of its Python modules (``localedata`` next to
``babel.localedata``). This means not only py2exe will not bundle
those locale data (as it doesn't bundle package data in general) but
even if it did babel would be unable to find them.

Babel localedata hack
---------------------

Thankfully, Babel actually stores the path to its ``localedata``
directory in a module-level variable called ``_dirname``.

Thus, we can package those data files during the building of the
py2exe archive [#]_ and, on startup of the web server, set
``babel.localedata._dirname`` to a directory of our own [#]_.

.. [#] Mako's TemplateLookup system takes a list of actual filesystem
       directories, and its ``filesystem_check`` feature (enabled by
       default) further assumes regular FS access as it calls
       ``os.stat`` on the template file.

.. [#] by default, the web client relies on CherryPy's `staticfile
   tools <http://docs.cherrypy.org/dev/progguide/files/static.html>`_
   which — as far as I can tell — assumes normal FS access.

.. [#] this is done in ``py2exe_utils`` via the ``babel_localedata``
       function which lists all the locale data of the local babel
       installation and stores the paths as its own ``data_files``,
       which the main setup.py script appends to its own
       ``data_files`` listing.

.. [#] ``openobject.commands.configure_babel``

.. _pkgutil.get_data(package, path):
.. _get_data:
    http://docs.python.org/library/pkgutil.html#pkgutil.get_data

.. _Mako: http://makotemplates.org
.. _CherryPy: http://cherrypy.org