~ubuntu-branches/ubuntu/edgy/bioperl/edgy

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
# $Id: FAQ,v 1.6 2002/03/08 20:36:25 jason Exp $

Bioperl FAQ
-----------

I. Bioperl in general

  1.  What is Bioperl?
  2.  Where do I go to get the latest release?
  3.  What is the difference between 0.9.x and 0.7.x? What do you mean
      developer release?
  4.  Is it BioPerl, bioperl, bio.perl.org, Bioperl?  What's the deal?
  5.  How do I figure out how to use a module?
  6.  I'm interested in the bleeding edge version of the code, where
      can I get it?
  7.  Who uses this toolkit?
  8.  How should I cite Bioperl?
  9.  What are the License terms for Bioperl?

II. Sequences

  1. How do I parse a sequence file?  
  2. I can't get sequences with Bio::DB::GenBank any more, why not?
  3. How can I get NT_ or NM_ accessions from NCBI (Reference Sequences)?

III. Report parsing

  1. I want to parse BLAST, how do I do this?  
  2. What's wrong with Bio::Tools::Blast?
  3. I want to parse FastA, how do I do this?
  4. Let's say I want to do pairwise alignments of 2 sequences how can
     I do this?

IV. Utilities

  1. How do I find all the ORFs in a nucleotide sequence? Antigenic sites
     in a protein? Calculate nucleotide melting temperature? Find repeats?
  2. How do I do motif searches with Bioperl? Can I do "find all sequences
     that are 75% identical" to a given motif?
  3. Can I query MEDLINE or other bibliographic repositories using Bioperl?

------------------------------------------------------------------------
This FAQ maintained by those listed below:
Jason Stajich <jason@bioperl.org>
Brian Osborne <b_i_osborne@hotmail.com>

------------------------------------------------------------------------
I. Bioperl in general
------------------------------------------------------------------------

  1. What is Bioperl?

     Bioperl is a tookit of perl modules useful in building
     bioinformatics solutions in perl.  It is built in an
     object-oriented manner so that many modules depend on each other
     to achieve a task.  The collection of modules in the bioperl-live
     repository consist of the core of the functionality of bioperl.
     Additionally auxiliary modules for creating graphical interfaces
     (bioperl-gui), persistent storage in RDMBS (bioperl-db), and
     CORBA bridges to the BioCORBA (www.biocorba.org) specification
     (bioperl-corba-server and bioperl-corba-client) are all available
     as CVS modules in our repository.      

  2. Where do I go to get the latest release?
 
     You can always get our releases from ftp://bioperl.org/pub/DIST.
     Official releases will be noted on the website http://bioperl.org
  
  3. What is the difference between 0.9.x and 0.7.x? What do you mean
     developer release?

     0.7.X series (0.7.0, 0.7.2) were all released in 2001 and were
     stable releases on 0.7 branch.  This means they had a set of
     functionality that is maintained throughout (no experimental
     modules) and were guaranteed to have all tests and subsequent bug
     fix releases with the 0.7 designation would not have any API
     changes.

     The 0.9.X series was our first attempt at releasing so called
     developer releases.  These are snapshots of the actively
     developed code that at a minimum pass all our tests 

  4.  Is it BioPerl, bioperl, bio.perl.org, Bioperl?  What's the deal?

      Well, the perl.org guys granted us use of bio.perl.org.  We
      prefer to be called Bioperl or BioPerl (unlike our Biopython
      friends).  We're part of the Open Bioinformatics Foundation
      (OBF) and so as part of the Bio{*} toolkits we prefer the
      Bioperl spelling.  But we're not really all that picky so no worries. 
   
  5.  How do I figure out how to use a module?

      Read the embedded perl documentation (Plain Old Documentation -
      POD) that is part of every modules.  Do
      % perldoc MODULE 
      (careful - spelling and case counts!).

      The bioperl tutorial - bptutorial.pl - provided in the root
      directory of the bioperl release will also provide a good
      introduction.  There are links to tutorials off the bioperl
      website that may provide some additional help.
 
      There are also many scripts in the examples/ and scripts/
      directories that could be useful - see bioperl.pod for a brief
      description of all of them.

      Additionally we have written many tests for our modules, you
      can see test data and example usage of the modules in these
      tests - look in the test dir (called 't').

  6.  I'm interested in the bleeding edge version of the code, where
      can I get it?
      
      Go to http://cvs.bioperl.org and you'll see instructions on how
      to get the CVS code.

      Basically:
      % cvs -d :pserver:cvs@cvs.bioperl.org:/home/repository/bioperl login
      enter 'cvs' for the password
      % cvs -d :pserver:cvs@cvs.bioperl.org:/home/repository/bioperl
            co bioperl_all

  7.  Who uses this toolkit?

      Lots of people.  Sanger Centre, EBI, many large and small
      academic laboratories, large and small pharmaceutical companies.
      All the developers on the bioperl list use the toolkit in some
      capacity on a regular basis.
      
      The Genquire annotation system (www.bioinformatics.org/Genquire/)
      and Ensembl (www.ensembl.org) use bioperl as the basis for their
      implementation. 
        
  8.  How should I cite Bioperl?

      For now cite it as "The Bioperl Project, http://www.bioperl.org".      
      
  9.  What are the License terms for Bioperl?

      Bioperl is licensed under the same terms as Perl itself which is
      the Perl Artistic License.  You can see more information on that
      license at http://www.perl.com/pub/a/language/misc/Artistic.html
      and http://www.opensource.org/licenses/artistic-license.html
            

------------------------------------------------------------------------
II. Sequences
------------------------------------------------------------------------

  1. How do I parse a sequence file?

     Use the Bio::SeqIO system.  This will create Bio::Seq objects for
     you.  See the tutorial bptutorial.pl for more information or the
     documentation for Bio::SeqIO (e.g. 'perldoc SeqIO.pm').


  2. I can't get sequences with Bio::DB::GenBank any more, why not?

     NCBI changed the web CGI script that provided this access.  You
     must be using bioperl <= 0.7.2.  The developer release 0.9.3
     contains this fix as does the 1.0 release.

  3. How can I get NT_ or NM_ accessions from NCBI (Reference
     Sequences)?

     Use Bio::DB::RefSeq not Bio::DB::GenBank when you are retrieving
     the NM_ accessions. This is still an area of active development
     because the data providers have not provided the best interface
     for us to query.  EBI has provided a mirror with their dbfetch
     system which is accessible through the Bio::DB::RefSeq object
     however, there are cases where NT_ accessions will not be
     retrievable.  

------------------------------------------------------------------------
III. Report parsing
------------------------------------------------------------------------
   
  1. I want to parse BLAST, how do I do this?  

     Well you might notice that there are a lot of choices.  Sorry
     about that.  We've been evolving towards a single solution.

     Currently the best way to parse a report is to use the SearchIO
     system.  This supports blast and fasta report parsing.  The
     bptutorial provides an example of how to use this system as well
     as the documentation in the Bio::SearchIO system.

  2. What's wrong with Bio::Tools::Blast?

     Nothing is really wrong with it, it has just been outgrown by a
     more generic approach to reports.  This generic approach allows
     us to just write pluggable modules for fasta and Blast parsing
     while using the same framework.  This is completely analogous to
     the Bio::SeqIO system of parsing sequence files.  However, the
     objects produced are of the Bio::Search rather than Bio::Seq
     variety.

  3. I want to parse FastA or NCBI -m7 (XML) format, how do I do this?

     It is as simple as parsing text BLAST results - you simply need
     to specify the format as "fasta" or "blastxml" and the parser
     will load the appropriate module for you.  You can use the exact
     logic and code for all of these formats as we have generalized
     the modules for sequence database searching.

  4. Let's say I want to do pairwise alignments of 2 sequences how can
     I do this?

     See the Bio::Factory::EMBOSS to see how to use the 'water' and
     'needle' alignment programs that are part of the EMBOSS suite.
     
     Additionally you can use the pSW module that is part of the
     bioperl-ext package (distributed separated at
     ftp://bioperl.org/pub/DIST).  However note this only does protein
     alignments and is no longer a supported module.  Instead the
     EMBOSS implementation is the the best path ahead unless someone
     else wants to provide an Inline::C implementation.

------------------------------------------------------------------------
IV. Utilities
------------------------------------------------------------------------

  1. How do I find all the ORFs in a nucleotide sequence? Antigenic sites
     in a protein? Calculate nucleotide melting temperature? Find repeats?

     In fact, none of these functions are built into Bioperl but they are
     all available in the EMBOSS package (www.uk.embnet.org/Software/EMBOSS),
     as well as many others. The Bioperl developers created a simple
     interface to EMBOSS such that any and all EMBOSS programs can be run
     from within Bioperl. See Bio::Factory::EMBOSS for more information.

     If you can't find the functionality you want in Bioperl then make sure
     to look for it in EMBOSS, these packages integrate quite gracefully
     with Bioperl. Of course, you will have to install EMBOSS to get this
     access.
     
  2. How do I do motif searches with Bioperl? Can I do "find all sequences
     that are 75% identical" to a given motif?

     There are a number of approaches inside and outside of Bioperl. Within
     Bioperl take a look at Bio::Tools::SeqPattern, but it's also conceivable
     that the combination of Bioperl and Perl's regular expressions could do
     the trick. You might also consider the CPAN module String::Approx (this
     module addresses the percent match query). Or, take a look at the TFBS
     package, at http://forkhead.cgb.ki.se/TFBS (Transcription Factor Binding
     Site). This Bioperl-compliant package specializes in pattern searching
     of nucleotide sequence using matrices.

  3. Can I query MEDLINE or other bibliographic repositories using Bioperl?

     Yes! The solution lies in Bio::Biblio*, a set of modules that provide
     access to MEDLINE and OpenBQS-compliant servers using SOAP. See
     Bio/Biblio.pm or examples/biblio.pl for details and example code.