2
# BioPerl module for Bio::DB::Taxonomy::silva
4
# Please direct questions and support issues to <bioperl-l@bioperl.org>
6
# Copyright Florent Angly
8
# You may distribute this module under the same terms as perl itself
13
Bio::DB::Taxonomy::silva - Use the Silva taxonomy
17
use Bio::DB::Taxonomy;
19
my $db = Bio::DB::Taxonomy->new(
21
-taxofile => 'SSURef_108_tax_silva_trunc.fasta',
26
This is an implementation of Bio::DB::Taxonomy which stores and accesses the
27
Silva taxonomy. Internally, Bio::DB::Taxonomy::silva keeps the taxonomy
28
into memory by using Bio::DB::Taxonomy::list. As a consequence, note that the
29
IDs assigned to the taxonomy nodes, e.g. sv72, are arbitrary, contrary to the
30
pre-defined IDs that NCBI assigns to taxons. Note also that no rank names or
31
common names are assigned to the taxa of Bio::DB::Taxonomy::silva.
33
The latest Silva taxonomy (2011) contains about 126,000 taxa and occupies
34
about 124 MB of memory once parsed into a Bio::DB::Taxonomy::silva object.
35
Obviously, it can take a little while to load.
37
The taxonomy file SSURef_108_tax_silva_trunc.fasta that this module uses is
38
available from L<http://www.arb-silva.de/no_cache/download/archive/release_108/Exports/>.
44
User feedback is an integral part of the evolution of this and other
45
Bioperl modules. Send your comments and suggestions preferably to
46
the Bioperl mailing list. Your participation is much appreciated.
48
bioperl-l@bioperl.org - General discussion
49
http://bioperl.org/wiki/Mailing_lists - About the mailing lists
53
Please direct usage questions or support issues to the mailing list:
55
I<bioperl-l@bioperl.org>
57
rather than to the module maintainer directly. Many experienced and
58
reponsive experts will be able look at the problem and quickly
59
address it. Please include a thorough description of the problem
60
with code and data examples if at all possible.
64
Report bugs to the Bioperl bug tracking system to help us keep track
65
of the bugs and their resolution. Bug reports can be submitted via
68
https://redmine.open-bio.org/projects/bioperl/
70
=head1 AUTHOR - Florent Angly
72
florent.angly@gmail.com
76
The rest of the documentation details each of the object methods.
77
Internal methods are usually preceded with a _
82
package Bio::DB::Taxonomy::silva;
87
use base qw(Bio::DB::Taxonomy Bio::DB::Taxonomy::list);
89
$Bio::DB::Taxonomy::list::prefix = 'sv';
95
Usage : my $obj = Bio::DB::Taxonomy::silva->new();
96
Function: Builds a new Bio::DB::Taxonomy::silva object
97
Returns : an instance of Bio::DB::Taxonomy::silva
98
Args : -taxofile => name of the FASTA file containing the taxonomic information,
99
typically 'SSURef_108_tax_silva_trunc.fasta' (mandatory)
104
# Override Bio::DB::Taxonomy
105
my($class, @args) = @_;
106
my $self = $class->SUPER::new(@args);
107
my ($taxofile) = $self->_rearrange([qw(TAXOFILE)], @args);
110
$self = $self->_build_taxonomy($taxofile);
117
sub _build_taxonomy {
118
my ($self, $taxofile) = @_;
120
my $taxonomy = Bio::DB::Taxonomy::list->new();
122
my $desc_re = qr/^>\S+?(?:\s+(.+))?$/;
124
# One could open the file using Bio::SeqIO::fasta, but it is slower and we
125
# only need the sequence descriptions
127
open my $in, '<', $taxofile or $self->throw("Could not read file '$taxofile': $!\n");
129
# Populate taxonomy with taxonomy obtained from sequence description
130
while (my $line = <$in>) {
132
next if $line !~ $desc_re;
133
my $taxo_string = $1;
134
next if not $taxo_string;
136
# Example of taxonomy string:
137
# 1/ Bacteria;Firmicutes;Bacilli;Lactobacillales;Enterococcaceae;Enterococcus;Enterococcus faecium DO
138
# 2/ Eukaryota;Metazoa;Chordata;Craniata;Vertebrata;Euteleostomi;Mammalia;Eutheria;Euarchontoglires;Glires;
139
# Rodentia;Sciurognathi;Muroidea;Muridae;Murinae;Rattus;;Rattus norvegicus (Norway rat)
141
# Skip already seen taxas
142
next if exists $taxas{$taxo_string};
143
$taxas{$taxo_string} = undef;
145
# Strip the common name (could save it if Bio::DB::Taxonomy::list supported it)
146
$taxo_string =~ s/ \(.*\)$//;
149
# Unfortunately, we cannot easily add ranks since they vary from 2 to 23 for every entry
150
my @names = split /;/, $taxo_string;
151
$taxonomy->add_lineage(