16
This module take the advantage of SAX, a stream-based XML parser
17
technology, to keep the used memory as small as possible.
18
InterProHandler in the same directory use another scheme to keep all
19
interpro record in the engine and keep the ontology data in memory,
20
almost impossible to operate it!
23
This module is for parsing an InterPro XML file and persist the
24
resulting terms to a Biosql database as soon as the term is complete
25
as signaled by the appropriate xml tag. This parser takes advantage of
26
SAX, a stream-based XML parser technology, to keep the used memory as
27
small as possible. The alternative parser for InterPro, module
28
InterProHandler, builds up the entire ontology in memory, which given
29
the size of the latest InterPro releases requires a huge amount of
32
This module takes the following non-standard arguments upon
35
-db the adaptor factory as returned by a call to
37
-version the InterPro version (not available as property!)
38
-term_factory the object factory to use for creating terms
40
Note that there are two alternatives for how to persist the terms and
41
relationships to the database. The default is using the adaptor
42
factory passed as -db or set as a property to create persistent
43
objects and store them in the database. The alternative is to specify
44
a term persistence and a relationship persistence handler; if one or
45
both have been set, the respective handler will be called with each
46
term and relationship that is to be stored. See properties
47
persist_term_handler and persist_relationship_handler.
24
51
Juguang Xiao, juguang@tll.org.sg
55
Hilmar Lapp, hlapp at gmx.net
28
59
The rest of the documentation details each of the object methods.
50
80
my($self,@args)=@_;
51
81
$self->SUPER::_initialize(@args);
52
my ($db, $version) = $self->_rearrange(
53
[qw(DB VERSION)], @args);
54
defined $db or $self->throw('db must be assigned');
56
my $ontology = Bio::Ontology::Ontology->new(
58
-definition => "InterPro, $version"
82
my ($db, $version, $fact) = $self->_rearrange(
83
[qw(DB VERSION TERM_FACTORY)], @args);
84
$self->db($db) if $db; # this is now a property and may be set later
86
$fact = Bio::Ontology::TermFactory->new(-type=>"Bio::Ontology::Term");
88
$self->term_factory($fact);
89
my $ontology = Bio::Ontology::Ontology->new(-name => 'InterPro');
90
if (defined($version)) {
91
$version = "InterPro version $version";
92
$ontology->definition($version);
60
94
$self->_ontology($ontology);
61
95
$is_a_rel = Bio::Ontology::RelationshipType->get_instance('IS_A');
62
96
$is_a_rel->ontology($ontology);
102
Usage : $obj->term_factory($newval)
103
Function: Get/set the ontology term factory to use.
105
As a user of this module it is not necessary to call this
106
method as there will be default. In order to change the
107
default, the easiest way is to instantiate
108
L<Bio::Ontology::TermFactory> with the proper -type
109
argument. Most if not all parsers will actually use this
110
very implementation, so even easier than the aforementioned
111
way is to simply call
112
$ontio->term_factory->type("Bio::Ontology::MyTerm").
115
Returns : value of term_factory (a Bio::Factory::ObjectFactoryI object)
116
Args : on set, new value (a Bio::Factory::ObjectFactoryI object, optional)
124
return $self->{'term_factory'} = shift if @_;
125
return $self->{'term_factory'};
131
Usage : $obj->db($newval)
132
Function: Sets or retrieves the database adaptor factory.
134
The adaptor factory is a Bio::DB::DBAdaptorI compliant
135
object and will be used to obtain the persistence adaptors
136
necessary to serialize terms and relationships to the
139
Usually, you will obtain such an object from a call to
140
Bio::DB::BioDB. You *must* set this property before
143
Note that this property is immutable once set, except that
144
you may set it to undef. Therefore, be careful not to set
145
to undef before setting the desired real value.
148
Returns : value of db (a Bio::DB::DBAdaptorI compliant object)
149
Args : on set, new value (a Bio::DB::DBAdaptorI compliant object
159
if ($db && exists($self->{_db}) && ($self->{_db} != $db)) {
160
$self->throw('db may not be modified once set');
167
=head2 persist_term_handler
169
Title : persist_term_handler
170
Usage : $obj->persist_term_handler($handler,@args)
171
Function: Sets or retrieves the persistence handler for terms along
172
with the constant set of arguments to be passed to the
175
If set, the first argument will be treated as a closure and
176
be called for each term to persist to the database. The
177
term will be passed as a named parameter (-term), followed
178
by the other arguments passed to this setter. Note that
179
this allows to pass an arbitrary configuration to the
182
If not set, terms will be persisted along with their
183
relationships using the respective persistence adaptor
184
returned by the adaptor factory (see property db).
187
Returns : an array reference with the values passed on set, or an empty
189
Args : On set, an array of values. The first value is the handler
190
as a closure; all other values will be passed to the handler
191
as constant argument.
196
sub persist_term_handler{
199
return $self->{'persist_term_handler'} = [@_] if @_;
200
return $self->{'persist_term_handler'} || [];
203
=head2 persist_relationship_handler
205
Title : persist_relationship_handler
206
Usage : $obj->persist_relationship_handler($handler,@args)
207
Function: Sets or retrieves the persistence handler for relationships
208
along with the constant set of arguments to be passed to
211
If set, the first argument will be treated as a closure and
212
be called for each relationship to persist to the database. The
213
relationship will be passed as a named parameter (-rel), followed
214
by the other arguments passed to this setter. Note that
215
this allows to pass an arbitrary configuration to the
218
If not set, relationships will be persisted along with their
219
relationships using the respective persistence adaptor
220
returned by the adaptor factory (see property db).
223
Returns : an array reference with the values passed on set, or an empty
225
Args : On set, an array of values. The first value is the handler
226
as a closure; all other values will be passed to the handler
227
as constant argument.
232
sub persist_relationship_handler{
235
return $self->{'persist_relationship_handler'} = [@_] if @_;
236
return $self->{'persist_relationship_handler'} || [];
241
Title : _persist_term
243
Function: Persists a term to the database, using either a previously
244
set persistence handler, or the adaptor factory directly.
247
Args : the ontology term to persist
256
my ($handler,@args) = @{$self->persist_term_handler};
258
&$handler('-term' => $term, @args);
260
# no handler; we'll do this ourselves straight and simple
261
my $db = $self->db();
262
my $pterm = $db->create_persistent($term);
269
$self->warn("failed to store term '".$term->name."': ".$@);
274
=head2 _persist_relationship
276
Title : _persist_relationship
278
Function: Persists a relationship to the database, using either a
279
previously set persistence handler, or the adaptor factory
284
Args : the term relationship to persist
289
sub _persist_relationship {
293
my ($handler,@args) = @{$self->persist_relationship_handler};
295
&$handler('-rel' => $rel, @args);
297
# no handler; we'll do this ourselves straight and simple
298
my $db = $self->db();
299
my $prel = $db->create_persistent($rel);
306
$self->warn("failed to store relationship of subject '"
307
.$rel->subject_term->name."' to object '"
308
.$rel->object_term->name.": ".$@);
313
=head2 _persist_ontology
315
Title : _persist_ontology
317
Function: Perists the ontology itself to the database, by either
318
inserting or updating it.
320
Note that this will only create or update the ontology as
321
an entity, not any of its terms, relationships, or
325
Returns : the ontology as a peristent object with primary key
326
Args : the ontology to persist as a Bio::Ontology::OntologyI
332
sub _persist_ontology{
335
my $db = $self->db();
337
# do a lookup first; chances are we have this already in the database
338
my $adp = $db->get_object_adaptor($ont);
339
# to avoid clobbering this ontology's properties with possibly older ones
340
# from the database we'll need an object factory
342
Bio::Factory::ObjectFactory->new(-type=>"Bio::Ontology::Ontology");
344
my $found = $adp->find_by_unique_key($ont, '-obj_factory' => $ontfact);
345
# make a persistent object of the ontology
346
$ont = $db->create_persistent($ont);
347
# transfer primary key if found in the lookup
348
$ont->primary_key($found->primary_key) if $found;
352
$result = $ont->store();
354
if ($@ || !$result) {
356
$self->throw("failed to update ontology '"
357
.$ont->name."' in database".($@ ? ": $@" : ""));
360
# done - we don't commit here
361
return ref($result) ? $result : $ont;
65
364
sub start_document {
67
366
my $ont = $self->_ontology;
69
$ont->add_term($self->create_term(-identifier=>'Family', -name=>'Family'));
70
$ont->add_term($self->create_term(-identifier=>'Domain', -name=>'Domain'));
71
$ont->add_term($self->create_term(-identifier=>'Repeat', -name=>'Repeat'));
72
$ont->add_term($self->create_term(-identifier=>'PTM',
73
-name=>'post-translational modification'));
74
$ont->add_term($self->create_term(
75
-identifier=>'Active_site', -name=>'Active_site'));
76
$ont->add_term($self->create_term(
77
-identifier=>'Binding_site', -name=>'Binding_site'));
368
$self->create_term(-identifier=>'IPR:Family',
371
$self->create_term(-identifier=>'IPR:Domain',
374
$self->create_term(-identifier=>'IPR:Repeat',
377
$self->create_term(-identifier=>'IPR:PTM',
378
-name=>'post-translational modification',
380
$self->create_term(-identifier=>'IPR:Active_site',
381
-name=>'Active_site',
383
$self->create_term(-identifier=>'IPR:Binding_site',
384
-name=>'Binding_site',
387
foreach my $iprtype (@iprtypes) {
388
$self->_persist_term($iprtype);
389
$ont->add_term($iprtype);
81
393
sub start_element {
240
553
sub _create_publication {
242
my $publication = $self->_current_hash->{publication};
243
my $author = $self->_current_hash->{author};
244
my $journal = $self->_current_hash->{journal};
245
my $year = $self->_current_hash->{year};
246
my $page_location = $self->_current_hash->{page_location};
247
my $volumn = $self->_current_hash->{volumn};
248
$publication->authors($author);
249
$publication->location("$journal, $year, V $volumn, $page_location");
250
my $title = $self->_current_hash->{title};
251
$publication->title($title);
252
my $medline = $self->_current_hash->{medline};
253
$publication->medline($medline);
555
my $publ = $self->_current_hash->{publication};
556
my $journal = $self->_current_hash->{journal} || '<no journal>';
557
my $year = $self->_current_hash->{year} || '<no year>';
558
my $page_location = $self->_current_hash->{page_location} || '<no pages>';
559
my $volumn = $self->_current_hash->{volumn} || '<no volume>';
561
$self->_current_hash->{medline} || $self->_current_hash->{pubmed};
563
$publ->authors($self->_current_hash->{author});
564
$publ->location("$journal, $year, V $volumn, $page_location");
565
$publ->title($self->_current_hash->{title});
566
$publ->medline($medline);
567
if ($self->_current_hash->{pubmed}
568
&& ($self->_current_hash->{pubmed} != $medline)) {
569
$publ->pubmed($self->_current_hash->{pubmed});
255
572
# Clear the above in current hash
256
573
$self->_current_hash->{publication} = undef;