1
***********************************************
2
Building and using a new application controller
3
***********************************************
5
.. sectionauthor:: Greg Caporaso
7
.. note: The code in this file is specifically not doctested because it is describing how to define a new application controller, which won't be in cogent. This file won't be available to import in most/any cases, so shouldn't be tested.
11
This document provides an example for defining and using a new application controller [#attribution]_. We'll look at wrapping the ``formatdb`` application from the BLAST 2.2.20 package `available from NCBI <http://www.ncbi.nlm.nih.gov/BLAST/download.shtml>`_. (Note this is what NCBI now refers to as *legacy BLAST*, not BLAST+.)
13
This document was developed in the process of writing the full ``formatdb`` application controller in PyCogent. You can find that file in your PyCogent directory at: ``cogent/app/formatdb.py``. After you work through this example, you should refer to that file to see what the full application controller and convenience wrappers look like.
15
A more complete reference on `PyCogent Application Controllers can be found here <./application_controller_framework.html>`_.
17
Building a formatdb application controller
18
==========================================
20
Step 0. Decide which application you want to support, and which version.
21
------------------------------------------------------------------------
23
Decide what version of the program you want to support and install the application. Check out the features, and decide what functionality you want to support::
27
For the sake of brevity in this example, we'll support only the basic functionality: creating nucleic acid or protein blast databases in the default format, and with default names (i.e., named based on the names of the input file).
29
So, we'll support the ``-i``, ``-o``, and ``-p`` parameters.
31
Step 1. Define the class and the class variables.
32
-------------------------------------------------
33
First, create a new file called ``minimal_formatdb.py``. Open this in a text editor, and add the following header lines::
37
from cogent.app.util import CommandLineApplication, ResultPath
38
from cogent.app.parameters import ValuedParameter
40
This imports some classes that we'll be using from PyCogent. For this to function correctly, ``cogent`` must be in your ``$PYTHONPATH``. Next define your class, ``MinimalFormatDb``::
42
class MinimalFormatDb(CommandLineApplication):
43
""" Example APPC for the formatdb program in the blast package """
45
Note that you're defining this class to inherit from ``CommandLineApplication``. This is where most of the heavy lifting is done.
47
We'll next want to define the following class variables:
49
* ``_command``: the command called to start the application
50
* ``_parameters``: the parameters to be passed (we'll select a few that we're going to support)
51
* ``_input_handler``: function describing how to convert a single parameter passed to the app controller object into input for the function
52
* ``_suppress_stdout``: (if other than False)
53
* ``_suppress_stderr``: (if other than False)
56
This is done by adding the following lines::
60
'-i':ValuedParameter(Prefix='-',Name='i',Delimiter=' ',IsPath=True),
61
'-o':ValuedParameter(Prefix='-',Name='o',Delimiter=' ',Value='T'),
62
'-p':ValuedParameter(Prefix='-',Name='p',Delimiter=' ',Value='F')
64
_input_handler = "_input_as_parameter"
66
You'll see here that we're only defining the ``-i``, ``-o``, and ``-p`` parameters, hence the name of this call being ``MinimalFormatDb``. An important variable to note here is ``_input_handler``. We'll come back to that next.
68
An addition thing to note here is that I'm setting Value for ``-p`` to ``F`` instead of the default of ``T``. This is because I usually build nucleotide databases (specified by ``-p F``), not protein databases (which is the default, and specified by ``-p T``).
71
Step 2. Overwrite methods as necessary.
72
---------------------------------------
73
We'll next create a non-default input handler. The input handler takes the input that you'll eventually provide when calling an instance of ``MinimalFormatDb``, and prepares it be passed to the actual command line application that you're calling. ``formatdb`` requires that you pass a fasta files via the ``-i`` parameters, so we'll define a new function ``_input_as_parameter``, here::
75
def _input_as_parameter(self,data):
78
self.Parameters['-i'].on(data)
81
Input handlers return a string that gets appended to the command, but turning parameters on also caused them to be added to the command. For that reason, this input handler returns an empty string -- otherwise ``-i`` would end up being passed twice to ``formatdb``.
83
Finally, we'll define the ``_get_result_paths``, which is the function that tells the application controller what files it should expect to be written, and under what circumstances. Our ``MinimalFormatDb`` application controller writes a log file under all circumstances, and nucleotide databases when ``-p`` is set to ``F`` or protein databases when ``-p`` is set to ``T``. We return the resulting files as a dict of ResultPath objects::
85
def _get_result_paths(self,data):
90
result['log'] = ResultPath(\
91
Path=self.WorkingDir+'formatdb.log',IsWritten=True)
93
if self.Parameters['-p'].Value == 'F':
94
extensions = ['nhr','nin','nsi','nsq','nsd']
96
extensions = ['phr','pin','psi','psq','psd']
98
for extension in extensions:
99
result[extension] = ResultPath(\
100
Path=data + '.' + extension,\
104
At this stage, you've created an application controller which supports interacting with a few features of the ``formatdb`` command line application controller. In the next step, we'll look at how to use your new application controller.
106
Using the new formatdb application controller
107
=============================================
109
Next we'll import the new ``minimal_formatdb`` application controller, and test it out. For the following examples, you need to access some files that are in your ``cogent/doc/data`` directory. For simplicity, we'll assume that on your system this directory is ``/home/pycogent_user/PyCogent/cogent/doc/data``. You should always replace this directory with the path as it exists on your machine.
111
Open a python interpreter in the directory where you created your ``minimal_formatdb.py`` and enter the following commands::
113
>>> import minimal_formatdb
114
>>> fdb = minimal_formatdb.MinimalFormatDb()
115
>>> res = fdb('/home/pycogent_user/PyCogent/doc/data/refseqs.fasta')
118
You'll see that you've created a new protein BLAST database -- you can tell because you have the nucleotide database files in the result object (i.e., they begin with ``n``).
120
Next clean up your the files that were created::
124
Next we'll change some parameters settings, and confirm the changes::
126
>>> fdb = minimal_formatdb.MinimalFormatDb()
127
>>> fdb.Parameters['-p'].on('T')
128
>>> fdb.Parameters['-p'].isOn()
130
>>> fdb.Parameters['-p'].Value
132
>>> str(fdb.Parameters['-p'])
135
We've just set the -p parameter to F, indicating that a protein database should be built instead of a nucleotide database. Note that the database files now begin with ``p``. Run the appc and investigate the results::
137
>>> res = fdb('/home/pycogent_user/PyCogent/doc/data/refseqs.fasta')
140
Next clean up your the files that were created::
145
Tips and tricks when writing applications controllers
146
=======================================================
147
One of the most useful features of application controller object when building and debugging is the HALT_EXEC parameter that can be passed to the constructor. This will cause the program to halt just before executing, and print the command that would have been run. For example:
149
>>> fdb = minimal_formatdb.MinimalFormatDb(HALT_EXEC=True)
150
>>> res = fdb('/home/pycogent_user/PyCogent/doc/data/refseqs.fasta')
152
Halted exec with command:
153
cd "/home/pycogent_user/"; formatdb -o T -i "/home/pycogent_user/PyCogent/doc/data/refseqs.fasta" -p F > "/tmp/tmpBpMUXE0ksEhzIZA1SSbS.txt" 2> "/tmp/tmpSKc0PRhTl47SZfkxY0g1.txt"
155
You can then leave the interpreter and paste this command onto the command line, and see directly what happens if this command is called. It's usually useful to remove the stdout and stderr redirects (i.e., everything after and including the first ``>``). For example::
157
cd "/home/pycogent_user/"; formatdb -o T -i "/home/pycogent_user/PyCogent/doc/data/refseqs.fasta"
159
.. rubric:: Footnotes
161
.. [#attribution] This document was modified from Greg Caporaso's PyCogent lecture. You can also grab the `full lecture materials <http://www.caporaso.us/presentations/caporaso_pycogent_lecture.zip>`_.