========================================== OpenERP Web Client packaging peculiarities ========================================== -------------------------------------------------- Duplication of efforts in setup.py and Manifest.in -------------------------------------------------- As you may note in reading the packaging files of the Web Client, there are severe redundancies between sections of the ``setup.py`` file and the ``MANIFEST.in`` file. These are due to the very variable behavior of ``distutils`` commands in the face of these data: * ``distutils.core.setup`` has a ``data_files`` keyword argument whose purpose is the listing of the non-code files to distribute, and the location where they should be put in the produced package. * ``setup.py install`` uses this list to know which files to carry over * py2exe uses this as well to determine which files it should move into the ``dist_dir``, directly in that directory * but ``setup.py sdist`` completely ignores ``data_files`` and instead relies on the content of the ``MANIFEST.in`` file to know what to port where * *unless* it is used on a Python 2.7 installation *and* there is no ``MANIFEST.in`` file (at all). In that case, ``sdist`` will respect ``data_files`` Because the web client supports Python 2.5 up, the latter is not an option. Creating a custom parser & translator (or trying to find out where it lives in the distribution) for ``MANIFEST.in`` was deemed non-viable at this point, and thus ``find_data_file`` was born (a very simple function able to find and filter files in directories and return them in the format ``data_files`` expects) * ``distutils.core.setup`` also has a ``scripts`` keyword argument which is a list of scripts to install. Because py2exe can generate a number of executable target types, it ignores ``scripts`` and instead wants a ``console`` kwarg for the same purpose. * ``distutils.core.setup`` has a ``package_data`` keyword argument. This is used to list non-code files within Python (code) packages, using a mapping from python packages (dotted notation) to a list of globbing paths. * ``setup.py install`` uses this specification to bundle non-code files in the package it installs into your site-package * ``sdist`` behaves very similarly to how it behaves when faced with ``data_files``: it completely ignores the specification and requires a ``MANIFEST.in`` file, *unless* you're using Python 2.7 and *do not* have a ``MANIFEST.in`` template at all. * py2exe entirely ignores ``package_data`` and doesn't include any of the specified files in its output, neither as a toplevel file tree nor as part of the output archive. This resulted in the ``PackageDataCollector`` class in ``py2exe_utils`` which extends ``py2exe`` to bundle ``package_data`` files into the archive produced by py2exe. ---------- Data Files ---------- The Web Client not only has a number of purely data files (e.g. this documentation), but for packaging purposes it treats its bundled addons (those which are not distributed as part of an installed server-side module) as data files as well. Among other things, this means the web client addons sit in a directory outside the archive generated by py2exe. --------------------------------- pkgutil, pkg_resources and py2exe --------------------------------- As described above, the Web Client holds a number of data files within as well as outside of its core ``openobject`` package. Python's stdlib provides a restricted tool to manage those files in the form of `pkgutil.get_data(package, path)`_. Sadly this function is limited in scope in that it only works within packages, and its interface is badly suited towards existence tests. setuptools/distutils builds upon ``get_data`` with a number of `resource access methods `_ providing a richer interface to resources management. There is also an added capability of being able to work with complete distutils *distributions*, not just python packages, and thus of being able to access top-level resources outside any Python package structure (such as this very file). Sadly, while this works correctly with ``install`` or ``sdist``, *it utterly fails with py2exe* because py2exe apparently doesn't store/bundle/know of distribution information/data. At this point I haven't investigated this failure any further. The issue simply remains (for now) that distributions-based requirements (and thus resources management) is not an option. This includes `using pkg_resources.require to assert that all necessary distributions are available and correctly installed `_. -------------------------------------------------------------------------------- Package data and third-party libraries, because things worked to well previously -------------------------------------------------------------------------------- As hinted-at in `duplication of efforts in setup.py and Manifest.in`_, the data files within the ``openobject`` package stay stored there. Because they are collected by py2exe, this also means they get bundled as part of py2exe's zip archive. Considering `get_data`_ this ought not be an issue, but some of the libraries used by the web client plain and simply assume everything lives on a trivially accessible file system (or seem to, from my reading/understanding of them so far). Chief among them are `Mako`_ [#]_ and `CherryPy`_ [#]_. Having static files bundled within the archive also doesn't work very well for users wanting to: * Customize the templates for CDN serving of static files * Serve static files via a dedicated web server tuned for static content (`lighttpd `_, `nginx `_, `cherokee `_ or a stripped `Apache `_)) --------------------------------------------------------------------------------- Third-party libraries and their own package data, because why would it just work? --------------------------------------------------------------------------------- The issue of making our own package data available to dependencies having been sidestepped, we are faced with an other issue: some dependencies bundle their own package data, which not only py2exe doesn't pick by default but which they themselves can not retrieve from a package as they're using direct filesystem manipulation tools to do their data-queries. Babel ===== Babel is "a collection of tools for internationalizing Python applications". The web client uses it for pretty much all of its i18n including formatting of dates or numbers, and gettext-ing. The core issue with babel is that it carries with itself a number of data files used for locale-specific formatting (among other things). It accesses these data directly by looking for a given directory next to one of its Python modules (``localedata`` next to ``babel.localedata``). This means not only py2exe will not bundle those locale data (as it doesn't bundle package data in general) but even if it did babel would be unable to find them. Babel localedata hack --------------------- Thankfully, Babel actually stores the path to its ``localedata`` directory in a module-level variable called ``_dirname``. Thus, we can package those data files during the building of the py2exe archive [#]_ and, on startup of the web server, set ``babel.localedata._dirname`` to a directory of our own [#]_. .. [#] Mako's TemplateLookup system takes a list of actual filesystem directories, and its ``filesystem_check`` feature (enabled by default) further assumes regular FS access as it calls ``os.stat`` on the template file. .. [#] by default, the web client relies on CherryPy's `staticfile tools `_ which — as far as I can tell — assumes normal FS access. .. [#] this is done in ``py2exe_utils`` via the ``babel_localedata`` function which lists all the locale data of the local babel installation and stores the paths as its own ``data_files``, which the main setup.py script appends to its own ``data_files`` listing. .. [#] ``openobject.commands.configure_babel`` .. _pkgutil.get_data(package, path): .. _get_data: http://docs.python.org/library/pkgutil.html#pkgutil.get_data .. _Mako: http://makotemplates.org .. _CherryPy: http://cherrypy.org