1
<?xml version="1.0" encoding="ISO-8859-1"?>
3
PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
4
<html xmlns="http://www.w3.org/1999/xhtml"><head><title>Bio::Graphics HOWTO</title><link rel="stylesheet" href="e-novative.css" type="text/css"/><meta name="generator" content="DocBook XSL Stylesheets V1.55.0"/><meta name="description" content="
This HOWTO describes how to render sequence data graphically in a
horizontal map. It applies to a variety of situations ranging from
rendering the feature table of a GenBank entry, to graphing the
positions and scores of a BLAST search, to rendering a clone map. It
describes the programmatic interface to the Bio::Graphics module, and
discusses how to create dynamic web pages using Bio::DB::GFF and the
gbrowse package.
"/></head><body><div class="article"><div class="titlepage"><div><h1 class="title"><a id="d3e1"/>Bio::Graphics HOWTO</h1></div><div><div class="author"><h3 class="author">Lincoln Stein</h3><div class="affiliation"><span class="orgname">
5
<a href="http://www.cshl.org" target="_top">Cold Spring Harbor Laboratory</a>
6
<br/></span><div class="address"><p>
7
<tt><<a href="mailto:lstein@cshl.org">lstein@cshl.org</a>></tt>
8
</p></div></div></div></div><div><div class="legalnotice"><p>
9
This document is copyright Lincoln Stein, 2002. It can
10
be copied and distributed under the terms of the Perl
12
</p></div></div><div><p class="pubdate">2002-09-01</p></div><div><div class="revhistory"><table border="1" width="100%" summary="Revision history"><tr><th align="left" valign="top" colspan="3"><b>Revision History</b></th></tr><tr><td align="left">Revision 0.2</td><td align="left">2003-05-15</td><td align="left">lds</td></tr><tr><td align="left" colspan="3">Current as of BioPerl 1.2.2</td></tr></table></div></div><div><div class="revhistory"><table border="1" width="100%" summary="Revision history"><tr><th align="left" valign="top" colspan="3"><b>Revision History</b></th></tr><tr><td align="left">Revision 0.3</td><td align="left">2003-09-17</td><td align="left">BIO</td></tr><tr><td align="left" colspan="3">Current as of BioPerl 1.2.3</td></tr></table></div></div><div><div class="abstract"><p class="title"><b>Abstract</b></p><p>
13
This HOWTO describes how to render sequence data graphically in a
14
horizontal map. It applies to a variety of situations ranging from
15
rendering the feature table of a GenBank entry, to graphing the
16
positions and scores of a BLAST search, to rendering a clone map. It
17
describes the programmatic interface to the Bio::Graphics module, and
18
discusses how to create dynamic web pages using Bio::DB::GFF and the
20
</p></div></div><hr/></div><div class="toc"><p><b>Table of Contents</b></p><dl><dt>1.�<a href="#intro">Introduction</a></dt><dt>2.�<a href="#prelim">Preliminaries</a></dt><dt>3.�<a href="#gettingStarted">Getting Started</a></dt><dt>4.�<a href="#addingscale">Adding a Scale to the Image</a></dt><dt>5.�<a href="#improving">Improving the Image</a></dt><dt>6.�<a href="#parsing-blast">Parsing Real BLAST Output</a></dt><dt>7.�<a href="#rendering-features">Rendering Features from a GenBank or EMBL File</a></dt><dt>8.�<a href="#rendering-features-better">A Better Version of the Feature Renderer</a></dt><dt>9.�<a href="#summary">Summary</a></dt></dl></div><div class="section"><div class="titlepage"><div><h2 class="title" style="clear: both"><a id="intro"/>1.�Introduction</h2></div></div><p>
21
This HOWTO describes the Bio::Graphics module, and some of the
22
applications that were built on top of it. Bio::Graphics was designed
23
to solve the following common problems:
24
</p><div class="itemizedlist"><ul type="disc"><li><p>
25
You have a list of BLAST hits on a sequence and you want to
26
generate a picture that shows where the hits go and what their score
29
You have a big GenBank file with a complex feature table, and
30
you want to render the positions of the genes, repeats, promoters
33
You have a list of ESTs that you've mapped to a genome, and you want
34
to show how they align.
36
You have created a clone fingerprint map, and you want to display it.
37
</p></li></ul></div><p>
38
The Bio::Graphics module was designed to solve these problems. In
39
addition, using the Bio::DB::GFF module (part of BioPerl) and the
40
gbrowse program (available from http://www.gmod.org) you can create
41
interactive web pages to explore your data.
43
This document takes you through a few common applications of
44
Bio::Graphics in a cookbook fashion.
45
</p></div><div class="section"><div class="titlepage"><div><h2 class="title" style="clear: both"><a id="prelim"/>2.�Preliminaries</h2></div></div><p>
46
Bio::Graphics is dependent on GD, a Perl module for generating
47
bitmapped graphics written by the author. GD in turn is dependent on
48
libgd, a C library written by Thomas Boutell, formerly also of Cold
49
Spring Harbor Laboratory (www.boutell.com/gd). To use Bio::Graphics,
50
you must have both these software libraries installed.
52
If you are on a Linux system, you might already have GD installed. To
53
verify, run the following command:
55
<pre class="programlisting">
56
% perl -MGD -e 'print $GD::VERSION';
59
If the program prints out a version number, you are in luck.
60
Otherwise, if you get a "Can't locate GD.pm" error, you'll have to
61
install the module. For users of ActiveState Perl this is very easy.
62
Just start up the PPM program and issue the command "install GD". For
63
users of other versions of Perl, you should go to www.cpan.org,
64
download a recent version of the GD module, unpack it, and follow the
65
installation directions. You may also need to upgrade to a recent
66
version of the libgd C library.
68
If the program prints out a version number, you are in luck.
69
Otherwise, if you get a "Can't locate GD.pm" error, you'll have to
70
install the module. For users of ActiveState Perl this is very easy.
71
Just start up the PPM program and issue the command "install GD". For
72
users of other versions of Perl, you should go to www.cpan.org,
73
download a recent version of the GD module, unpack it, and follow the
74
installation directions.
77
You may need to upgrade to a recent version of the libgd C library.
78
At the time this was written, there were two versions of libgd. libgd
79
version 1.8.4 is the stable version, and corresponds to GD version
80
1.43. libgd version 2.0.1 is the beta version; although it has many
81
cool features, it also has a few known bugs (which Bio::Graphics works
82
around). If you use libgd 2.0.1 or higher, be sure it matches GD
83
version 2.0.1 or higher.
85
You will also need to install the Text::Shellwords module, which is
87
</p></div><div class="section"><div class="titlepage"><div><h2 class="title" style="clear: both"><a id="gettingStarted"/>3.�Getting Started</h2></div></div><p>
88
All the code examples and BLAST input files we'll use are available in
89
the doc/howto/examples/graphics directory in the BioPerl package.
91
Our first example will be rendering a table of BLAST hits on a
92
sequence that is exactly 1000 residues long. For now, we're ignoring
93
finicky little details like HSPs, and assume that each hit is a single
94
span from start to end. Also, we'll be using the BLAST score rather
95
than P or E value. Later on, we'll switch to using real BLAST output
96
parsed by the Bio::SearchIO module, but for now, our table looks like
99
<div class="figure"><a id="d3e58"/><pre class="programlisting">
100
# hit score start end
109
</pre><p class="title"><b>Figure�1.�Simple blast hit file (data1.txt)</b></p></div>
111
Our first attempt to parse and render this file looks like this:
113
<div class="example"><a id="code1"/><p class="title"><b>Example�1.�Rendering the simple blast hit file (render_blast1.pl)</b></p><pre class="programlisting">
116
1 # This is code example 1 in the Graphics-HOWTO
119
4 use Bio::SeqFeature::Generic;
121
5 my $panel = Bio::Graphics::Panel->new(-length => 1000,-width => 800);
122
6 my $track = $panel->add_track(-glyph => 'generic',-label => 1);
124
7 while (<>) { # read blast file
126
9 next if /^\#/; # ignore comments
127
10 my($name,$score,$start,$end) = split /\t+/;
128
11 my $feature = Bio::SeqFeature::Generic->new(-display_name=>$name,-score=>$score,
129
12 -start=>$start,-end=>$end);
130
13 $track->add_feature($feature);
133
15 print $panel->png;
136
The script begins by loading the Bio::Graphics module (line 3), which
137
in turn brings in a number of other modules that we'll use later. We
138
also load <tt>Bio::SeqFeature::Generic</tt> in order to
139
create a series of Bio::SeqFeatureI objects for rendering. We then
140
create a Bio::Graphics::Panel object by calling its new() method,
141
specifying that the panel is to correspond to a sequence that is 1000
142
nucleotides long, and has a physical width of 800 pixels (line 5).
143
The Panel can contain multiple horizontal tracks, each of which has
144
its own way of rendering features (called a "glyph"), color, labeling
145
convention, and so forth. In this simple example, we create a single
146
track by calling the panel object's add_track() method (line 6),
147
specify a glyph type of "generic", and ask that the objects in the
148
track be labeled by providing a true value to the -label argument.
149
This gives us a track object that we can add our hits to.
151
We're now ready to render the blast hit file. We loop through it
152
(line 7-14), stripping off the comments, and parsing out the name,
153
score and range (line 10). We now need a Bio::SeqFeatureI object to
154
place in the track. The easiest way to do this is to create a
155
Bio::SeqFeature::Generic object, which is similar to Bio::PrimarySeq,
156
except that it provides a way of attaching start and end positions to
157
the sequence, as well as such nebulous but useful attributes as the
158
"score" and "source". The Bio::SeqFeature::Generic->new() method,
159
invoked in line 11, takes arguments corresponding to the name of each
160
hit, its start and end coordinates, and its score.
162
After creating the feature object, we add it to the track by calling
163
the track's add_feature() method (line 13).
165
After processing all the hits, we call the panel's png() method to
166
render them and convert it into a Portable Network Graphics file, the
167
contents of which are printed to standard output. We can now view the
168
result by piping it to our favorite image display program.
170
IMPORTANT NOTE: If you are on a Windows platform, you need to put
171
STDOUT into binary mode so that the PNG file does not go through
172
Window's carriage return/linefeed transformations. Before the
173
final print statement, put the statement "binmode(STDOUT)".
175
This advice also applies to certain versions of RedHat, which ship
176
with a patched (and possibly broken) version of Perl.
178
<div class="figure"><a id="d3e74"/><pre class="programlisting">
179
% render_blast1.pl data1.txt | display -
180
</pre><div class="mediaobject"><img src="../figs/graphics/fig1.png"/></div><p class="title"><b>Figure�2.�Rendering BLAST hits</b></p></div>
182
Users of operating systems that don't support pipes can simply
183
redirect the output to a file and view it in their favorite image
185
</p></div><div class="section"><div class="titlepage"><div><h2 class="title" style="clear: both"><a id="addingscale"/>4.�Adding a Scale to the Image</h2></div></div><p>
186
This is all very nice, but it's missing two essential components:
187
</p><div class="itemizedlist"><ul type="disc"><li><p>It doesn't have a scale.</p></li><li><p>It doesn't distinguish between hits with different
189
</p></li></ul></div><p>Example 2 fixes these problems</p><p>
190
<div class="example"><a id="code2"/><p class="title"><b>Example�2.�Rendering the blast hit file with scores and scale</b></p><pre class="programlisting">
193
1 # This is code example 2 in the Graphics-HOWTO
195
3 use lib '/home/lstein/projects/bioperl-live';
197
5 use Bio::SeqFeature::Generic;
199
6 my $panel = Bio::Graphics::Panel->new(-length => 1000,
201
8 -pad_left => 10,
202
9 -pad_right => 10,
204
11 my $full_length = Bio::SeqFeature::Generic->new(-start=>1,-end=>1000);
205
12 $panel->add_track($full_length,
206
13 -glyph => 'arrow',
208
15 -fgcolor => 'black',
212
18 my $track = $panel->add_track(-glyph => 'graded_segments',
214
20 -bgcolor => 'blue',
215
21 -min_score => 0,
216
22 -max_score => 1000);
218
23 while (<>) { # read blast file
220
25 next if /^\#/; # ignore comments
221
26 my($name,$score,$start,$end) = split /\t+/;
222
27 my $feature = Bio::SeqFeature::Generic->new(-display_name=>$name,-score=>$score,
223
28 -start=>$start,-end=>$end);
224
29 $track->add_feature($feature);
227
31 print $panel->png;
230
There are several changes to look at. The first is minor. We'd like
231
to put a boundary around the left and right edges of the image so that
232
the features don't bump up against the margin, so we specify a 10
233
pixel leeway with the <i><tt>-pad_left</tt></i> and
234
<i><tt>-pad_right</tt></i> arguments in lines 8 and 9.
236
The next change is more subtle. We want to draw a scale all the way
237
across the image. To do this, we create a track to contain the scale,
238
and a feature that spans the track from the start to the end. Line 11
239
creates the feature, giving its start and end coordinates. Lines
240
12-17 create a new track containing this feature. Unlike the previous
241
example, in which we created the track first and then added features
242
one at a time with add_feature(), we show here how to add feature(s)
243
directly in the call to add_track(). If the first argument to
244
add_track is either a single feature or a feature array ref, then
245
add_track() will automatically incorporate the feature(s) into the
246
track in a single efficient step. The remainder of the arguments
247
configure the track as before. The -glyph argument says to use the
248
"arrow" glyph. The -tick argument indicates that the arrow should
249
contain tick marks, and that both major and minor ticks should be
250
shown (tick type of "2"). We set the foreground color to black, and
251
request that arrows should be placed at both ends (-double =>1).
253
<sup>[<a id="d3e99" href="#ftn.d3e99">1</a>]</sup>
255
In lines 18-22, we get a bit fancier with the blast hit track. Now,
256
instead of creating a generic glyph, we use the "graded_segments"
257
glyph. This glyph takes the specified background color for the
258
feature, and either darkens or lightens it according to its score. We
259
specify the base background color (-bgcolor => 'blue'), and the
260
minimum and maximum scores to scale to (-min_score and -max_score).
261
(You may need to experiment with the min and max scores in order to get
262
the glyph to scale the colors the way you want.) The remainder of the
265
When we run the modified script, we get this image.
267
<div class="figure"><a id="d3e104"/><div class="mediaobject"><img src="../figs/graphics/fig2.png"/></div><p class="title"><b>Figure�3.�The improved image</b></p></div>
269
IMPORTANT NOTE: Remember that if you are on a Windows platform, you need to put
270
STDOUT into binary mode so that the PNG file does not go through
271
Window's carriage return/linefeed transformations. Before the
272
final print statement, write binmode(STDOUT).
273
</p></div><div class="section"><div class="titlepage"><div><h2 class="title" style="clear: both"><a id="improving"/>5.�Improving the Image</h2></div></div><p>
274
Before we move into displaying gapped alignments, let's tweak the
275
image slightly so that higher scoring hits appear at the top of the
276
image, and the score itself is printed in red underneath each hit.
277
The changes are shown in Example 3.
279
<div class="example"><a id="code3"/><p class="title"><b>Example�3.�Rendering the blast hit file with scores and scale</b></p><pre class="programlisting">
282
1 # This is code example 3 in the Graphics-HOWTO
284
3 use lib '/home/lstein/projects/bioperl-live';
286
5 use Bio::SeqFeature::Generic;
288
6 my $panel = Bio::Graphics::Panel->new(-length => 1000,
290
8 -pad_left => 10,
291
9 -pad_right => 10,
293
11 my $full_length = Bio::SeqFeature::Generic->new(-start=>1,-end=>1000);
294
12 $panel->add_track($full_length,
295
13 -glyph => 'arrow',
297
15 -fgcolor => 'black',
301
18 my $track = $panel->add_track(-glyph => 'graded_segments',
303
20 -bgcolor => 'blue',
304
21 -min_score => 0,
305
22 -max_score => 1000,
306
23 -font2color => 'red',
307
24 -sort_order => 'high_score',
308
25 -description => sub {
309
26 my $feature = shift;
310
27 my $score = $feature->score;
311
28 return "score=$score";
314
30 while (<>) { # read blast file
316
32 next if /^\#/; # ignore comments
317
33 my($name,$score,$start,$end) = split /\t+/;
318
34 my $feature = Bio::SeqFeature::Generic->new(-score=>$score,
319
35 -display_name=>$name,
320
36 -start=>$start,-end=>$end);
321
37 $track->add_feature($feature);
324
39 print $panel->png;
327
There are two changes to look at. The first appears in line 24, where
328
we pass the <i><tt>-sort_order</tt></i> option to the call that
329
creates the blast hit track. <i><tt>-sort_order</tt></i>
330
changes the way that features sort from top to bottom, and will accept
331
a number of prepackaged sort orders or a coderef for custom sorting.
332
In this case, we pass a prepackaged sort order of
333
<i><tt>high_score</tt></i>, which sorts the hits from top to
334
bottom in reverse order of their score.
336
The second change is more complicated, and involves the -description
337
option that appears in the <tt>add_track()</tt> call on
338
lines 25-28. The value of <i><tt>-description</tt></i> will be
339
printed beneath each feature. We could pass
340
<i><tt>-description</tt></i> a constant string, but that would
341
simply print the same string under each feature. Instead we pass
342
<i><tt>-description</tt></i> a code reference to a subroutine
343
that will be invoked while the picture is being rendered. This
344
subroutine will be passed the current feature, and must return the
345
string to use as the value of the description. In our code, we simply
346
fetch out the BLAST hit's score using its <tt>score()</tt>
347
method, and incorporate that into the description string.
348
</p><div class="tip"><table border="0" summary="Tip"><tr><td rowspan="2" align="center" valign="top" width="25"><img alt="[Tip]" src="images/tip.png"/></td><th>Tip</th></tr><tr><td colspan="2" align="left" valign="top"><p>
349
The ability to use a code reference as a configuration option isn't
350
unique to <i><tt>-description</tt></i>. In fact, you can use a code reference for any
351
of the options passed to add_track().
352
</p></td></tr></table></div><p>
353
Another minor change is the use of
354
<i><tt>-font2color</tt></i> in line 23. This simply
355
sets the color used for the description strings. Figure 3 shows the
356
effect of these changes.
358
<div class="figure"><a id="d3e133"/><div class="mediaobject"><img src="../figs/graphics/fig3.png"/></div><p class="title"><b>Figure�4.�The image with descriptions and sorted hits</b></p></div>
359
</p></div><div class="section"><div class="titlepage"><div><h2 class="title" style="clear: both"><a id="parsing-blast"/>6.�Parsing Real BLAST Output</h2></div></div><p>
360
From here it's just a small step to writing a general purpose utility
361
that will read a BLAST file, parse its output, and output a picture.
362
The key is to use the <tt>Bio::SearchIO</tt>
363
infrastructure because it produces Bio::SeqFeatureI similarity hits
364
that can be rendered directly by <tt>Bio::Graphics</tt>.
366
Code example 4 shows the new utility.
368
<div class="example"><a id="code4"/><p class="title"><b>Example�4.�Parsing and Rendering a Real BLAST File with Bio::SearchIO</b></p><pre class="programlisting">
371
1 # This is code example 4 in the Graphics-HOWTO
373
3 use lib "$ENV{HOME}/projects/bioperl-live";
377
6 my $file = shift or die "Usage: render_blast4.pl <blast file>\n";
379
7 my $searchio = Bio::SearchIO->new(-file => $file,
380
8 -format => 'blast') or die "parse failed";
383
9 my $result = $searchio->next_result() or die "no result";
385
10 my $panel = Bio::Graphics::Panel->new(-length => $result->query_length,
387
12 -pad_left => 10,
388
13 -pad_right => 10,
391
15 my $full_length = Bio::SeqFeature::Generic->new(-start=>1,-end=>$result->query_length,
392
16 -display_name=>$result->query_name);
393
17 $panel->add_track($full_length,
394
18 -glyph => 'arrow',
396
20 -fgcolor => 'black',
401
24 my $track = $panel->add_track(-glyph => 'graded_segments',
403
26 -connector => 'dashed',
404
27 -bgcolor => 'blue',
405
28 -font2color => 'red',
406
29 -sort_order => 'high_score',
407
30 -description => sub {
408
31 my $feature = shift;
409
32 return unless $feature->has_tag('description');
410
33 my ($description) = $feature->each_tag_value('description');
411
34 my $score = $feature->score;
412
35 "$description, score=$score";
415
37 while( my $hit = $result->next_hit ) {
416
38 next unless $hit->significance < 1E-20;
417
39 my $feature = Bio::SeqFeature::Generic->new(-score => $hit->raw_score,
418
40 -display_name => $hit->name,
420
42 description => $hit->description
423
45 while( my $hsp = $hit->next_hsp ) {
424
46 $feature->add_sub_SeqFeature($hsp,'EXPAND');
427
48 $track->add_feature($feature);
430
50 print $panel->png;
433
In lines 6-8 we read the name of the file that contains the BLAST
434
results from the command line, and pass it to
435
<tt>Bio::SearchIO->new()</tt>, returning a
436
<tt>Bio::SearchIO</tt> object. We read a single result
437
from the searchIO object (line 9). This assumes that the BLAST output
438
file contains a single run of BLAST only.
440
We then initialize the panel and tracks as before. The only change
441
here is in lines 24-36, where we create the track for the BLAST hits.
442
The <i><tt>-description</tt></i> option has now been enhanced
443
to create a line of text that incorporates the "description" tag from
444
the feature object as well as its similarity score. There's also a
445
slight change in line 26, where we introduce the
446
<i><tt>-connector</tt></i> option. This allows us to configure a
447
line that connects the segments of a discontinuous feature, such as
448
the HSPs in a BLAST hit. In this case, we asked the rendering engine
449
to produce a dashed connector line.
451
The remainder of the script retrieves each of the hits from the BLAST
452
file, creates a Feature object representing the hit, and then
453
retrieves each HSP and incorporates it into the feature. Line 37
454
begins a <tt>while()</tt> loop that retrieves each of the
455
similarity hits in turn. We filter the hit by its significance,
456
throwing out any that have an expectation value greater than 1E-20
457
(you will have to adjust this in your own utilities). We then use the
458
information in the hit to construct a
459
<tt>Bio::SeqFeature::Generic</tt> object (lines 39-44).
460
Notice how the name of the hit and the score are used to initialize
461
the feature, and how the description is turned into a tag named
464
The start and end bounds of the hit are determined by the union of its
465
HSPs. We loop through each of the hit's HSPs by calling its
466
<tt>next_hsp()</tt> method, and add each HSP to the
467
newly-created hit feature by calling the feature's
468
<tt>add_sub_SeqFeature()</tt> method (line 46). The
469
<i><tt>EXPAND</tt></i> parameter instructs the feature to
470
expand its start and end coordinates to enclose the added subfeature.
472
Once all the HSPs are added to the feature, we insert the feature into
473
the track as before using the track's
474
<tt>add_feature()</tt> function.
476
Figure 4 shows the output from a sample BLAST hit file.
478
<div class="figure"><a id="d3e165"/><div class="mediaobject"><img src="../figs/graphics/fig4.png"/></div><p class="title"><b>Figure�5.�Output from the BLAST parsing and rendering script</b></p></div>
480
The next section will demonstrate how to parse and display feature
481
tables from GenBank and EMBL.
483
IMPORTANT NOTE: Remember that if you are on a Windows platform, you need to put
484
STDOUT into binary mode so that the PNG file does not go through
485
Window's carriage return/linefeed transformations. Before the
486
final print statement, write binmode(STDOUT).
487
</p></div><div class="section"><div class="titlepage"><div><h2 class="title" style="clear: both"><a id="rendering-features"/>7.�Rendering Features from a GenBank or EMBL File</h2></div></div><p>
488
With <tt>Bio::Graphics</tt> you can render the feature
489
table of a GenBank or EMBL file quite easily. The trick is to use
490
<tt>Bio::SeqIO</tt> to generate a set of
491
<tt>Bio::SeqFeatureI</tt> objects, and to use those
492
features to populate tracks. For simplicity's sake, we will sort each
493
feature by its primary tag (such as "exon") and create a new track for
496
Code example 5 shows the code for rendering an EMBL or GenBank entry.
498
<div class="example"><a id="code5"/><p class="title"><b>Example�5.�The embl2picture.pl script turns any EMBL or GenBank entry into a graphical rendering</b></p><pre class="programlisting">
501
1 # file: embl2picture.pl
502
2 # This is code example 5 in the Graphics-HOWTO
503
3 # Author: Lincoln Stein
506
5 use lib "$ENV{HOME}/projects/bioperl-live";
509
8 use Bio::SeqFeature::Generic;
511
9 my $file = shift or die "provide a sequence file as the argument";
512
10 my $io = Bio::SeqIO->new(-file=>$file) or die "couldn't create Bio::SeqIO";
513
11 my $seq = $io->next_seq or die "couldn't find a sequence in the file";
514
12 my $wholeseq = Bio::SeqFeature::Generic->new(-start=>1,-end=>$seq->length,
515
13 -display_name=>$seq->display_name);
517
14 my @features = $seq->all_SeqFeatures;
519
15 # partition features by their primary tags
520
16 my %sorted_features;
521
17 for my $f (@features) {
522
18 my $tag = $f->primary_tag;
523
19 push @{$sorted_features{$tag}},$f;
526
21 my $panel = Bio::Graphics::Panel->new(
527
22 -length => $seq->length,
528
23 -key_style => 'between',
530
25 -pad_left => 10,
531
26 -pad_right => 10,
533
28 $panel->add_track($wholeseq,
534
29 -glyph => 'arrow',
539
33 $panel->add_track($wholeseq,
540
34 -glyph => 'generic',
541
35 -bgcolor => 'blue',
546
39 my @colors = qw(cyan orange blue purple green chartreuse magenta yellow aqua);
548
41 for my $tag (sort keys %sorted_features) {
549
42 my $features = $sorted_features{$tag};
550
43 $panel->add_track($features,
551
44 -glyph => 'generic',
552
45 -bgcolor => $colors[$idx++ % @colors],
553
46 -fgcolor => 'black',
554
47 -font2color => 'red',
555
48 -key => "${tag}s",
559
52 -description => 1,
563
55 print $panel->png;
567
The way this script works is simple. After the library load preamble,
568
the script reads the name of the GenBank or EMBL file from the command
569
line (line 8). It passes the filename to
570
<tt>Bio::SeqIO</tt>'s <tt>new()</tt> method,
571
and reads the first sequence object from it (lines 9-11). If anything
572
goes wrong, the script dies with an error message.
574
The returned object is a Bio::SeqI object, which has a length but no
575
defined start or end coordinates. We would like to create a drawable
576
Bio::SeqFeatureI object to use for the scale, so we generate a new
577
Bio::SeqFeature::Generic object that goes from a start of 1 to the
578
length of the sequence. (lines 12-13).
580
The script reads the features from the sequence object by calling
581
<tt>all_SeqFeatures()</tt>, and then sorts each feature by
582
its primary tag into a hash of array references named
583
<tt>%sorted_features</tt> (lines 14-20).
585
Next, we create the <tt>Bio::Graphics::Panel</tt> object
586
(lines 21-27). As in previous examples, we specify the width of the
587
image, as well as some extra white space to pad out the left and right
590
We now add two tracks, one for the scale (lines 28-32) and the other
591
for the sequence as a whole (33-37). As in the earlier examples, we
592
pass <tt>add_track()</tt> the sequence object as the first
593
argument before the options so that the object is incorporated into
594
the track immediately.
596
We are now ready to create a track for each feature type. In order to
597
distinguish the tracks by color, we initialize an array of 9 color
598
names and simply cycle through them (lines 39-54). For each feature
599
tag, we retrieve the corresponding list of features from
600
<tt>%sorted_features</tt> (line 42) and create a track for
601
it using the "generic" glyph and the next color in the list (lines
602
43-53). We set the <tt>-label</tt> and
603
<tt>-description</tt> options to the value "1". This signals
604
<tt>Bio::Graphics</tt> that it should do the best it can
605
to choose useful label and description values on its own.
607
After adding all the feature types, we call the panel's
608
<tt>png()</tt> method to generate a graphic file, which we
609
print to STDOUT. If we are on a Windows platform, we would have to include
610
<tt>binmode(STDOUT)</tt> prior to this statement in order to
611
avoid Windows textmode carriage return/linefeed transformations.
613
Figure 5 shows an example of the output of this script.
615
<div class="figure"><a id="d3e204"/><div class="mediaobject"><img src="../figs/graphics/fig5.png"/></div><p class="title"><b>Figure�6.�The embl2picture.pl script</b></p></div>
616
</p></div><div class="section"><div class="titlepage"><div><h2 class="title" style="clear: both"><a id="rendering-features-better"/>8.�A Better Version of the Feature Renderer</h2></div></div><p>
617
The previous example's rendering has numerous deficiencies. For one
618
thing, there are no lines connecting the various CDS rectangles in the
619
CDS track to show how they are organized into a spliced transcript.
620
For another, the repetition of the source tag "EMBL/GenBank/SwissProt"
621
is not particularly illuminating.
623
However, it's quite easy to customize the display, making the script
624
into a generally useful utility. The revised code is shown in example
627
<div class="example"><a id="code6"/><p class="title"><b>Example�6.�The embl2picture.pl script turns any EMBL or GenBank entry into a graphical rendering</b></p><pre class="programlisting">
630
1 # file: embl2picture.pl
631
2 # This is code example 6 in the Graphics-HOWTO
632
3 # Author: Lincoln Stein
635
5 use lib "$ENV{HOME}/projects/bioperl-live";
639
8 use constant USAGE =><<END;
640
9 Usage: $0 <file>
641
10 Render a GenBank/EMBL entry into drawable form.
642
11 Return as a GIF or PNG image on standard output.
644
12 File must be in embl, genbank, or another SeqIO-
645
13 recognized format. Only the first entry will be
649
16 embl2picture.pl factor7.embl | display -
653
18 my $file = shift or die USAGE;
654
19 my $io = Bio::SeqIO->new(-file=>$file) or die USAGE;
655
20 my $seq = $io->next_seq or die USAGE;
656
21 my $wholeseq = Bio::SeqFeature::Generic->new(-start=>1,-end=>$seq->length,
657
22 -display_name=>$seq->display_name);
659
23 my @features = $seq->all_SeqFeatures;
661
24 # sort features by their primary tags
662
25 my %sorted_features;
663
26 for my $f (@features) {
664
27 my $tag = $f->primary_tag;
665
28 push @{$sorted_features{$tag}},$f;
668
30 my $panel = Bio::Graphics::Panel->new(
669
31 -length => $seq->length,
670
32 -key_style => 'between',
672
34 -pad_left => 10,
673
35 -pad_right => 10,
675
37 $panel->add_track($wholeseq,
676
38 -glyph => 'arrow',
681
42 $panel->add_track($wholeseq,
682
43 -glyph => 'generic',
683
44 -bgcolor => 'blue',
688
48 if ($sorted_features{CDS}) {
689
49 $panel->add_track($sorted_features{CDS},
690
50 -glyph => 'transcript2',
691
51 -bgcolor => 'orange',
692
52 -fgcolor => 'black',
693
53 -font2color => 'red',
697
57 -label => \&gene_label,
698
58 -description=> \&gene_description,
700
60 delete $sorted_features{'CDS'};
703
62 if ($sorted_features{tRNA}) {
704
63 $panel->add_track($sorted_features{tRNA},
705
64 -glyph => 'transcript2',
706
65 -bgcolor => 'red',
707
66 -fgcolor => 'black',
708
67 -font2color => 'red',
709
68 -key => 'tRNAs',
712
71 -label => \&gene_label,
714
73 delete $sorted_features{tRNA};
718
76 my @colors = qw(cyan orange blue purple green chartreuse magenta yellow aqua);
720
78 for my $tag (sort keys %sorted_features) {
721
79 my $features = $sorted_features{$tag};
722
80 $panel->add_track($features,
723
81 -glyph => 'generic',
724
82 -bgcolor => $colors[$idx++ % @colors],
725
83 -fgcolor => 'black',
726
84 -font2color => 'red',
727
85 -key => "${tag}s",
730
88 -description => \&generic_description
734
91 print $panel->png;
738
94 my $feature = shift;
740
96 foreach (qw(product gene)) {
741
97 next unless $feature->has_tag($_);
742
98 @notes = $feature->each_tag_value($_);
748
103 sub gene_description {
749
104 my $feature = shift;
751
106 foreach (qw(note)) {
752
107 next unless $feature->has_tag($_);
753
108 @notes = $feature->each_tag_value($_);
756
111 return unless @notes;
757
112 substr($notes[0],30) = '...' if length $notes[0] > 30;
761
115 sub generic_description {
762
116 my $feature = shift;
764
118 foreach ($feature->all_tags) {
765
119 my @values = $feature->each_tag_value($_);
766
120 $description .= $_ eq 'note' ? "@values" : "$_=@values; ";
768
122 $description =~ s/; $//; # get rid of last
773
At 124 lines, this is the longest example in this HOWTO, but the
774
changes are straightforward. The major difference occurs in lines
775
47-61 and 62-74, where we handle two special cases: "CDS" records and
776
"tRNAs". For these two feature types we would like to draw the
777
features like genes using the "transcript2" glyph. This glyph draws
778
inverted V's for introns, if there are any, and will turn the last (or
779
only) exon into an arrow to indicate the direction of transcription.
781
First we look to see whether there are any features with the primary
782
tag of "CDS" (lines 47-61). If so, we create a track for them using
783
the desired glyph. Line 49 shows how to add several features to a
784
track at creation time. If the first argument to
785
<tt>add_track()</tt> is an array reference, all the
786
features contained in the array will be incorporated into the track.
787
We provide custom code references for the
788
<i><tt>-label</tt></i> and <i><tt>-description</tt></i>
789
options. As we shall see later, the subroutines these code references
790
point to are responsible for extracting names and descriptions for the
791
coding regions. After we handle this special case, we remove the CDS
792
feature type from the <tt>%sorted_features</tt> array.
794
We do the same thing for tRNA features, but with a different color
795
scheme (lines 62-74).
797
Having dealt with the special cases, we render the remaining feature
798
types using the same code we used earlier. The only change is that
799
instead of allowing <tt>Bio::Graphics::Panel</tt> to
800
guess at the description from the feature's source tag, we use the
801
<tt>-description</tt> option to point to a subroutine
802
that will generate more informative description strings.
804
The <tt>gene_label()</tt> (lines 93-102) and
805
<tt>gene_description()</tt> (lines 103-114) subroutines
806
are simple. The first one searches the feature for the tags "product"
807
and/or "gene" and uses the first one it finds as the label for the
808
feature. The <tt>gene_description()</tt> subroutine is
809
similar, except that it returns the value of the first tag named
810
"note". If the description is over 30 characters long, it is
813
The <tt>generic_description()</tt> (lines 115-124) is
814
invoked to generate descriptions of all non-gene features. We simply
815
concatenate together the names and values of tags. For example the
817
<pre class="programlisting">
819
/db_xref="taxon:9606"
820
/organism="Homo sapiens"
823
will be turned into the description string "db_xref=taxon:9606;
824
organism=Homo Sapiens; map=13q34".
826
After adding all the feature types, we call the panel's
827
<tt>png()</tt> method to generate a graphic file, which we
830
Figure 6 shows an example of the output of this script.
832
<div class="figure"><a id="d3e238"/><div class="mediaobject"><img src="../figs/graphics/fig6.png"/></div><p class="title"><b>Figure�7.�The embl2picture.pl script</b></p></div>
833
</p></div><div class="section"><div class="titlepage"><div><h2 class="title" style="clear: both"><a id="summary"/>9.�Summary</h2></div></div><p>
834
In summary, we have seen how to use the
835
<tt>Bio::Graphics</tt> module to generate
836
representations of sequence features as horizontal maps. We applied
837
these techniques to two common problems: rendering the output of a
838
BLAST run, and rendering the feature table of a GenBank/EMBL entry.
840
The graphics module is quite flexible. In addition to the options
841
that we have seen, there are glyphs for generating point-like features
842
such as SNPs, specialized glyphs that draw GC content and open reading
843
frames, and glyphs that generate histograms, bar charts and other
844
types of graphs. <tt>Bio::Graphics</tt> has been used
845
to represent physical (clone) maps, radiation hybrid maps, EST
846
clusters, cytogenetic maps, restriction maps, and much more.
848
Although we haven't shown it, <tt>Bio::Graphics</tt>
849
provides support for generating HTML image maps. The <a href="http://www.gmod.org" target="_top">Generic Genome Browser</a> uses this
850
facility to generate clickable, browsable images of the genome from a
851
variety of genome databases.
853
Another application you should investigate is the render_sequence.pl
854
script. This script uses the BioFetch interface to fetch
855
GenBank/EMBL/SwissProt entries dynamically from the web before
856
rendering them into PNG images.
858
Finally, if you find yourself constantly tweaking the graphic options,
859
you might be interested in
860
<tt>Bio::Graphics::FeatureFile</tt>, a utility module
861
for interpreting and rendering a simple tab-delimited format for
862
sequence features. feature_draw.PLS is a Perl script built on top of
863
this module, which you can find in the scripts/graphics directory in
864
the Bioperl distribution.
865
</p></div><div class="footnotes"><br/><hr width="100" align="left"/><div class="footnote"><p><sup>[<a id="ftn.d3e99" href="#d3e99">1</a>] </sup>Obtain the list of glyphs by running perldoc on
866
Bio::Graphics::Glyph. Obtain a description of the glyph options by
867
running perldoc on individual glyphs, for example "perldoc Bio::Graphics::Glyph::arrow."
868
</p></div></div></div></body></html>
b'\\ No newline at end of file'