46
47
A methods are also provided for storing common
47
48
names, and subspecies.
54
User feedback is an integral part of the evolution of this and other
55
Bioperl modules. Send your comments and suggestions preferably to
56
the Bioperl mailing list. Your participation is much appreciated.
58
bioperl-l@bioperl.org - General discussion
59
http://bioperl.org/wiki/Mailing_lists - About the mailing lists
63
Report bugs to the Bioperl bug tracking system to help us keep track
64
of the bugs and their resolution. Bug reports can be submitted via the
67
http://bugzilla.open-bio.org/
51
71
James Gilbert email B<jgrg@sanger.ac.uk>
75
Sendu Bala, bix@sendu.me.uk
55
79
The rest of the documentation details each of the object
80
100
Args : -ncbi_taxid => NCBI taxonomic ID (optional)
81
101
-classification => arrayref of classification
87
my($class,@args) = @_;
89
my $self = $class->SUPER::new(@args);
91
my ($id, $cn,$div,$org,$sp,$var,
92
$classification) = $self->_rearrange([qw(NCBI_TAXID
106
my($class, @args) = @_;
108
my $self = $class->SUPER::new(@args);
110
my ($org, $sp, $var, $classification) =
111
$self->_rearrange([qw(ORGANELLE
98
114
CLASSIFICATION)], @args);
99
if( defined $classification &&
100
(ref($classification) eq "ARRAY") ) {
101
$self->classification(@$classification);
103
defined $id && $self->ncbi_taxid($id);
104
defined $div && $self->division($div);
105
defined $cn && $self->common_name($cn);
106
defined $org && $self->organelle($org);
107
defined $sp && $self->sub_species($sp);
108
defined $var && $self->variant($var);
116
if (defined $classification && ref($classification) eq "ARRAY" && @{$classification}) {
117
$self->classification(@$classification);
120
# store a tree on ourselves so we can use Tree methods
121
$self->{tree} = new Bio::Tree::Tree();
123
# some things want to freeze/thaw Bio::Species objects, but
124
# _root_cleanup_methods contains a CODE ref, delete it.
125
# delete $self->{tree}->{_root_cleanup_methods};
128
defined $org && $self->organelle($org);
129
defined $sp && $self->sub_species($sp);
130
defined $var && $self->variant($var);
113
135
=head2 classification
115
137
Title : classification
116
138
Usage : $self->classification(@class_array);
117
139
@classification = $self->classification();
118
Function: Fills or returns the classification list in
119
the object. The array provided must be in
120
the order SPECIES, GENUS ---> KINGDOM.
121
Checks are made that species is in lower case,
122
Example : $obj->classification(qw( sapiens Homo Hominidae
140
Function: Get/set the lineage of this species. The array provided must be in
141
the order ... ---> SPECIES, GENUS ---> KINGDOM ---> etc.
142
Example : $obj->classification(qw( 'Homo sapiens' Homo Hominidae
123
143
Catarrhini Primates Eutheria Mammalia Vertebrata
124
144
Chordata Metazoa Eukaryota));
125
145
Returns : Classification array
128
148
A reference to the classification array. In the latter case
129
149
if there is a second argument and it evaluates to true,
130
names will not be validated.
150
names will not be validated. NB: in any case, names are never
136
155
sub classification {
137
my ($self,@args) = @_;
141
my ($classif,$force);
143
$classif = shift(@args);
144
$force = shift(@args);
149
# Check the names supplied in the classification string
150
# Species should be in lower case
152
$self->validate_species_name($classif->[0]);
153
# All other names must be in title case
154
#foreach (@$classif) {
155
# $self->validate_name( $_ );
158
# Store classification
159
$self->{'_classification'} = $classif;
161
return @{$self->{'_classification'} || []};
156
my ($self, @vals) = @_;
159
if (ref($vals[0]) eq 'ARRAY') {
163
# make sure the lineage contains us as first or second element
164
# (lineage may have subspecies, species, genus ...)
165
my $name = $self->node_name;
166
if ($name && ($name ne $vals[0] && $name ne $vals[1]) && $name ne "$vals[1] $vals[0]") {
167
if ($name =~ /^$vals[1] $vals[0]\s*(.+)/) {
168
# just assume the problem is someone tried to make a Bio::Species starting at subspecies
169
#*** no idea if this is appropriate! just a possible fix related to bug 2092
170
$self->sub_species($1);
171
$name = $self->node_name("$vals[1] $vals[0]");
174
$self->throw("The supplied lineage does not start near '$name' (I was supplied '".join(" | ", @vals)."')");
178
# create a lineage for ourselves
179
my $db = Bio::DB::Taxonomy->new(-source => 'list', -names => [reverse @vals]);
180
unless ($self->scientific_name) {
181
# assume we're supposed to be the leaf of the supplied lineage
182
$self->scientific_name($vals[0]);
184
unless ($self->rank) {
185
# and that we are rank species
186
$self->rank('species');
189
$self->db_handle($db);
191
$self->{tree} = Bio::Tree::Tree->new(-node => $self);
192
# some things want to freeze/thaw Bio::Species objects, but tree's
193
# _root_cleanup_methods contains a CODE ref, delete it.
194
#*** even if we don't delete the cleanup methods, we still get memory
195
# leak-like symtoms, and the actual cleanup causes a mass of
196
# warnings... needs investigation!
197
delete $self->{tree}->{_root_cleanup_methods};
201
foreach my $node ($self->{tree}->get_lineage_nodes($self), $self) {
202
unshift(@vals, $node->scientific_name || next);
204
weaken($self->{tree}->{'_rootnode'}) unless isweak($self->{tree}->{'_rootnode'});
211
Usage : $obj->ncbi_taxid($newval)
212
Function: Get/set the NCBI Taxon ID
213
Returns : the NCBI Taxon ID as a string
214
Args : newvalue to set or undef to unset (optional)
164
218
=head2 common_name
166
220
Title : common_name
179
return $self->{'_common_name'} = shift if @_;
180
return $self->{'_common_name'};
186
Usage : $obj->variant($newval)
187
Function: Get/set variant information for this species object (strain,
190
Returns : value of variant (a scalar)
191
Args : new value (a scalar or undef, optional)
199
return $self->{'_variant'} = shift if @_;
200
return $self->{'_variant'};
206
Usage : $self->organelle( $organelle );
207
$organelle = $self->organelle();
208
Function: Get or set the organelle name
209
Example : $self->organelle('Chloroplast')
210
Returns : The organelle name in a string
211
Args : String, which is the organelle name
217
return $self->{'_organelle'} = shift if @_;
218
return $self->{'_organelle'};
233
Usage : $obj->division($newval)
234
Function: Genbank Division for a species
235
Returns : value of division (a scalar)
236
Args : value of division (a scalar)
224
243
Usage : $self->species( $species );
225
244
$species = $self->species();
226
Function: Get or set the scientific species name. The species
227
name must be in lower case.
228
Example : $self->species( 'sapiens' );
245
Function: Get or set the scientific species name.
246
Example : $self->species('Homo sapiens');
229
247
Returns : Scientific species name as string
230
248
Args : Scientific species name as string
236
my($self, $species) = @_;
238
if (defined $species) {
239
$self->validate_species_name( $species );
240
$self->{'_classification'}[0] = $species;
242
return $self->{'_classification'}[0];
253
my ($self, $species) = @_;
256
$self->{_species} = $species;
259
unless (defined $self->{_species}) {
260
# work it out from our nodes
261
my $species_taxon = $self->{tree}->find_node(-rank => 'species');
262
unless ($species_taxon) {
263
# just assume we are rank species
264
$species_taxon = $self;
267
$species = $species_taxon->scientific_name;
270
# munge it like the Bio::SeqIO modules used to do
271
# (more or less copy/pasted from old Bio::SeqIO::genbank, hence comments
272
# referring to 'ORGANISM' etc.)
275
my $root = $self->{tree}->get_root_node;
277
$self->{tree} = new Bio::Tree::Tree(-node => $species_taxon);
278
delete $self->{tree}->{_root_cleanup_methods};
279
$root = $self->{tree}->get_root_node;
282
my @spflds = split(' ', $species);
283
if (@spflds > 1 && $root->node_name ne 'Viruses') {
286
# does the next term start with uppercase?
287
# yes: valid genus; no then unconventional
288
# e.g. leaf litter basidiomycete sp. Collb2-39
290
if ($spflds[0] =~ m/^[A-Z]/) {
291
$genus = shift(@spflds);
299
while (my $fld = shift @spflds) {
301
# does it have subspecies or varieties?
302
last if ($fld =~ m/(sp\.|var\.)/);
304
chop $species; # last space
305
$sub_species = join ' ',@spflds if(@spflds);
311
# does ORGANISM start with any words which make its genus undefined?
312
# these are in @unkn_genus
313
# this in case species starts with uppercase so isn't caught above.
314
# alter common name if required
315
my $unconv = 0; # is it unconventional species name?
316
my @unkn_genus = ('unknown','unclassified','uncultured','unidentified');
317
foreach (@unkn_genus) {
318
if ($genus && $genus =~ m/$_/i) {
319
$species = $genus . " " . $species;
324
elsif ($species =~ m/$_/i) {
329
if (!$unconv && !$sub_species && $species =~ s/^(\w+)\s(\w+)$/$1/) {
330
# need to extract subspecies from conventional ORGANISM format.
331
# Will the 'word' in a two element species name
332
# e.g. $species = 'thummi thummi' => $species='thummi' &
333
# $sub_species='thummi'
337
$self->genus($genus) if $genus;
338
$self->sub_species($sub_species) if $sub_species;
341
$self->{_species} = $species;
344
return $self->{_species};
248
350
Usage : $self->genus( $genus );
249
351
$genus = $self->genus();
250
Function: Get or set the scientific genus name. The genus
251
must be in title case.
252
Example : $self->genus( 'Homo' );
352
Function: Get or set the scientific genus name.
353
Example : $self->genus('Homo');
253
354
Returns : Scientific genus name as string
254
355
Args : Scientific genus name as string
260
my($self, $genus) = @_;
262
if (defined $genus) {
263
#$self->validate_name( $genus );
264
$self->{'_classification'}[1] = $genus;
360
my ($self, $genus) = @_;
363
$self->{_genus} = $genus;
266
return $self->{'_classification'}[1];
366
unless (defined $self->{_genus}) {
367
my $genus_taxon = $self->{tree}->find_node(-rank => 'genus');
368
unless ($genus_taxon) {
369
# just assume our ancestor is rank genus
370
$genus_taxon = $self->ancestor;
373
$self->{_genus} = $genus_taxon->scientific_name if $genus_taxon;
376
return $self->{_genus};
269
379
=head2 sub_species
271
381
Title : sub_species
272
382
Usage : $obj->sub_species($newval)
383
Function: Get or set the scientific subspecies name.
274
384
Returns : value of sub_species
275
385
Args : newvalue (optional)
280
389
sub sub_species {
282
return $self->{'_sub_species'} = shift if @_;
390
my ($self, $sub) = @_;
392
unless (defined $self->{'_sub_species'}) {
393
my $ss_taxon = $self->{tree}->find_node(-rank => 'subspecies');
396
$ss_taxon->scientific_name($sub);
398
return $ss_taxon->scientific_name;
402
# fall back to direct storage on self
403
$self->{'_sub_species'} = $sub if $sub;
283
404
return $self->{'_sub_species'};
410
Usage : $obj->variant($newval)
411
Function: Get/set variant information for this species object (strain,
414
Returns : value of variant (a scalar)
415
Args : new value (a scalar or undef, optional)
420
my ($self, $var) = @_;
422
unless (defined $self->{'_variant'}) {
423
my $var_taxon = $self->{tree}->find_node(-rank => 'variant');
426
$var_taxon->scientific_name($var);
428
return $var_taxon->scientific_name;
432
# fall back to direct storage on self
433
$self->{'_variant'} = $var if $var;
434
return $self->{'_variant'};
300
my( $self, $full ) = @_;
302
my( $species, $genus ) = $self->classification();
303
unless( defined $species) {
305
$self->warn("requested binomial but classification was not set");
450
my ($self, $full) = @_;
451
my $rank = $self->rank || 'no rank';
453
my ($species, $genus) = ($self->species, $self->genus);
454
unless (defined $species) {
456
$self->warn("requested binomial but classification was not set");
307
$genus = '' unless( defined $genus);
458
$genus = '' unless( defined $genus);
460
$species =~ s/$genus\s+//;
308
462
my $bi = "$genus $species";
309
if (defined($full) && ((uc $full) eq 'FULL')) {
310
my $ssp = $self->sub_species;
311
$bi .= " $ssp" if $ssp;
463
if (defined($full) && $full =~ /full/i) {
464
my $ssp = $self->sub_species;
467
$ssp =~ s/$species\s+//;
474
=head2 validate_species_name
476
Title : validate_species_name
477
Usage : $result = $self->validate_species_name($string);
478
Function: Validate the species portion of the binomial
480
Notes : The string following the "genus name" in the NCBI binomial
481
is so variable that it's not clear that this is a useful
482
function. Consider the binomials
483
"Simian 11 rotavirus (serotype 3 / strain SA11-Patton)",
484
or "St. Thomas 3 rotavirus", straight from GenBank.
485
This is particularly problematic in microbes and viruses.
486
As such, this isn't actually used automatically by any Bio::Species
316
490
sub validate_species_name {
317
491
my( $self, $string ) = @_;
319
493
return 1 if $string eq "sp.";
320
return 1 if $string =~ /^[a-z][\w\s]+$/i;
494
return 1 if $string =~ /strain/;
495
return 1 if $string =~ /^[a-z][\w\s-]+$/i;
321
496
$self->throw("Invalid species name '$string'");
324
499
sub validate_name {
325
return 1; # checking is disabled as there is really not much we can
326
# enforce HL 2002/10/03
327
# my( $self, $string ) = @_;
329
# return 1 if $string =~ /^[\w\s\-\,\.]+$/ or
330
# $self->throw("Invalid name '$string'");
336
Usage : $obj->ncbi_taxid($newval)
337
Function: Get/set the NCBI Taxon ID
338
Returns : the NCBI Taxon ID as a string
339
Args : newvalue to set or undef to unset (optional)
347
return $self->{'_ncbi_taxid'} = shift if @_;
348
return $self->{'_ncbi_taxid'};
354
Usage : $obj->division($newval)
355
Function: Genbank Division for a species
356
Returns : value of division (a scalar)
357
Args : value of division (a scalar)
365
return $self->{'_division'} = shift if @_;
366
return $self->{'_division'};
506
Usage : $self->organelle( $organelle );
507
$organelle = $self->organelle();
508
Function: Get or set the organelle name
509
Example : $self->organelle('Chloroplast')
510
Returns : The organelle name in a string
511
Args : String, which is the organelle name
517
return $self->{'_organelle'} = shift if @_;
518
return $self->{'_organelle'};
523
$self->{tree}->cleanup_tree if $self->{tree};
524
delete $self->{tree};