1
#-----------------------------------------------------------------------------
2
# PACKAGE : Bio::Root::Utilities.pm
3
# PURPOSE : Provides general-purpose utilities of potential interest to any Perl script.
4
# AUTHOR : Steve Chervitz (sac@bioperl.org)
6
# REVISION: $Id: Utilities.pm,v 1.22 2003/06/04 08:36:42 heikki Exp $
9
# This module manages file compression and uncompression using gzip or
10
# the UNIX compress programs (see the compress() and uncompress() methods).
11
# Also, it can create filehandles from gzipped files. If you want to use a
12
# different compression utility (such as zip, pkzip, stuffit, etc.) you
15
# If you manage to incorporate an alternate compression utility into this
16
# module, please post a note to the bio.perl.org mailing list
17
# bioperl-l@bioperl.org
19
# TODO : Configure $GNU_PATH during installation.
20
# Improve documentation (POD).
21
# Make use of Date::Manip and/or Date::DateCalc as appropriate.
23
# MODIFICATIONS: See bottom of file.
25
# Copyright (c) 1996-2000 Steve Chervitz. All Rights Reserved.
26
# This module is free software; you can redistribute it and/or
27
# modify it under the same terms as Perl itself.
29
#-----------------------------------------------------------------------------
31
package Bio::Root::Utilities;
35
use vars qw($Loaded_POSIX $Loaded_IOScalar);
37
unless( eval "require POSIX" ) {
42
use Bio::Root::Global qw(:data :std $TIMEOUT_SECS);
43
use Bio::Root::Object ();
46
#*AUTOLOAD = \&AutoLoader::AUTOLOAD;
48
use vars qw( @ISA @EXPORT_OK %EXPORT_TAGS );
49
@ISA = qw( Bio::Root::Root Exporter);
50
@EXPORT_OK = qw($Util);
51
%EXPORT_TAGS = ( obj => [qw($Util)],
54
use vars qw($ID $Util $GNU_PATH $DEFAULT_NEWLINE);
56
$ID = 'Bio::Root::Utilities';
58
# $GNU_PATH points to the directory containing the gzip and gunzip
59
# executables. It may be required for executing gzip/gunzip
60
# in some situations (e.g., when $ENV{PATH} doesn't contain this dir.
61
# Customize $GNU_PATH for your site if the compress() or
62
# uncompress() functions are generating exceptions.
64
#$GNU_PATH = '/tools/gnu/bin/';
66
$DEFAULT_NEWLINE = "\012"; # \n (used if get_newline() fails for some reason)
68
## Static UTIL object.
71
$Util->{'_name'} = 'Static Utilities object';
77
Bio::Root::Utilities - General-purpose utility module
81
=head2 Object Creation
83
use Bio::Root::Utilities qw(:obj);
85
There is no need to create a new Bio::Root::Utilities.pm object when
86
the C<:obj> tag is used. This tag will import the static $Util
87
object created by Bio::Root::Utilities.pm into your name space. This
88
saves you from having to call C<new Bio::Root::Utilities>.
90
You are free to not use the :obj tag and create the object as you
91
like, but a Bio::Root::Utilities object is not configurable; any given
92
script only needs a single copy.
94
$date_stamp = $Util->date_format('yyy-mm-dd');
96
$clean = $Util->untaint($dirty);
98
$Util->mail_authority("Something you should know about...");
100
...and other methods. See below.
104
This module is included with the central Bioperl distribution:
106
http://bio.perl.org/Core/Latest
107
ftp://bio.perl.org/pub/DIST
109
Follow the installation instructions included in the README file.
113
Provides general-purpose utilities of potential interest to any Perl script.
114
Scripts and modules are expected to use the static $Util object exported by
115
this package with the C<:obj> tag.
119
B<Bio::Root::Utilities.pm> inherits from B<Bio::Root::Object.pm>.
120
It also relies on the GNU gzip program for file compression/uncompression.
124
Bio::Root::Object.pm - Core object
125
Bio::Root::Global.pm - Manages global variables/constants
127
http://bio.perl.org/Projects/modules.html - Online module documentation
128
http://bio.perl.org/ - Bioperl Project Homepage
130
FileHandle.pm (included in the Perl distribution or CPAN).
136
User feedback is an integral part of the evolution of this and other Bioperl modules.
137
Send your comments and suggestions preferably to one of the Bioperl mailing lists.
138
Your participation is much appreciated.
140
bioperl-l@bioperl.org - General discussion
141
http://bioperl.org/MailList.shtml - About the mailing lists
143
=head2 Reporting Bugs
145
Report bugs to the Bioperl bug tracking system to help us keep track the bugs and
146
their resolution. Bug reports can be submitted via email or the web:
148
bioperl-bugs@bio.perl.org
149
http://bugzilla.bioperl.org/
153
Steve Chervitz E<lt>sac@bioperl.orgE<gt>
155
See L<the FEEDBACK section | FEEDBACK> for where to send bug reports and comments.
159
Bio::Root::Utilities.pm, 0.042
161
=head1 ACKNOWLEDGEMENTS
163
This module was developed under the auspices of the Saccharomyces Genome
165
http://genome-www.stanford.edu/Saccharomyces
169
Copyright (c) 1997-98 Steve Chervitz. All Rights Reserved.
170
This module is free software; you can redistribute it and/or
171
modify it under the same terms as Perl itself.
178
#### END of main POD documentation.
186
Methods beginning with a leading underscore are considered private
187
and are intended for internal use by this module. They are
188
B<not> considered part of the public interface and are described here
189
for documentation purposes only.
194
############################################################################
195
## INSTANCE METHODS ##
196
############################################################################
201
Usage : $Util->date_format( [FMT], [DATE])
202
Purpose : -- Get a string containing the formated date or time
203
: taken when this routine is invoked.
204
: -- Provides a way to avoid using `date`.
205
: -- Provides an interface to localtime().
206
: -- Interconverts some date formats.
208
: (For additional functionality, use Date::Manip or
209
: Date::DateCalc available from CPAN).
210
Example : $Util->date_format();
211
: $date = $Util->date_format('yyyy-mmm-dd', '11/22/92');
212
Returns : String (unless 'list' is provided as argument, see below)
214
: 'yyyy-mm-dd' = 1996-05-03 # default format.
215
: 'yyyy-dd-mm' = 1996-03-05
216
: 'yyyy-mmm-dd' = 1996-May-03
217
: 'd-m-y' = 3-May-1996
218
: 'd m y' = 3 May 1996
220
: 'mdy' = May 3, 1996
224
: 'hms' = 23:01:59 # 'hms' can be tacked on to any of the above options
225
: # to add the time stamp: eg 'dmyhms'
226
: 'full' | 'unix' = UNIX-style date: Tue May 5 22:00:00 1998
227
: 'list' = the contents of localtime(time) in an array.
228
Argument : (all are optional)
229
: FMT = yyyy-mm-dd | yyyy-dd-mm | yyyy-mmm-dd |
230
: mdy | ymd | md | d-m-y | hms | hm
231
: ('hms' may be appended to any of these to
234
: DATE = String containing date to be converted.
235
: Acceptable input formats:
236
: 12/1/97 (for 1 December 1997)
240
Comments : Relies on the $BASE_YEAR constant exported by Bio:Root::Global.pm.
242
: If you don't care about formatting or using backticks, you can
243
: always use: $date = `date`;
245
: For more features, use Date::Manip.pm, (which I should
246
: probably switch to...)
248
See Also : L<file_date>(), L<month2num>()
257
my $date = shift; # optional date to be converted.
259
my($sec,$min,$hour,$mday,$mon,$year,$wday,$yday,$isdst);
261
$option ||= 'yyyy-mm-dd';
263
my ($month_txt, $day_txt, $month_num, $fullYear);
266
# Load a supplied date for conversion:
267
if(defined($date) && ($date =~ /[\D-]+/)) {
269
($mon,$mday,$year) = split(/\//, $date);
270
} elsif($date =~ /(\d{4})-(\d{1,2})-(\d{1,2})/) {
271
($year,$mon,$mday) = ($1, $2, $3);
272
} elsif($date =~ /(\d{4})-(\w{3,})-(\d{1,2})/) {
273
($year,$mon,$mday) = ($1, $2, $3);
274
$mon = $self->month2num($2);
276
print STDERR "\n*** Unsupported input date format: $date\n";
278
if(length($year) == 4) { $year = substr $year, 2; }
281
($sec,$min,$hour,$mday,$mon,$year,$wday,$yday,$isdst) = @date =
282
localtime(($date ? $date : time()));
283
return @date if $option =~ /list/i;
285
$month_txt = $MONTHS[$mon];
286
$day_txt = $DAYS[$wday] if defined $wday;
288
$fullYear = $BASE_YEAR+$year;
290
# print "sec: $sec, min: $min, hour: $hour, month: $mon, m-day: $mday, year: $year\nwday: $wday, yday: $yday, dst: $isdst";<STDIN>;
292
if( $option =~ /yyyy-mm-dd/i ) {
293
$date = sprintf "%4d-%02d-%02d",$fullYear,$month_num,$mday;
294
} elsif( $option =~ /yyyy-dd-mm/i ) {
295
$date = sprintf "%4d-%02d-%02d",$fullYear,$mday,$month_num;
296
} elsif( $option =~ /yyyy-mmm-dd/i ) {
297
$date = sprintf "%4d-%3s-%02d",$fullYear,$month_txt,$mday;
298
} elsif( $option =~ /full|unix/i ) {
299
$date = sprintf "%3s %3s %2d %02d:%02d:%02d %d",$day_txt, $month_txt, $mday, $hour, $min, $sec, $fullYear;
300
} elsif( $option =~ /mdy/i ) {
301
$date = "$month_txt $mday, $fullYear";
302
} elsif( $option =~ /ymd/i ) {
303
$date = $year."\l$month_txt$mday";
304
} elsif( $option =~ /dmy/i ) {
305
$date = $mday."\l$month_txt$year";
306
} elsif( $option =~ /md/i ) {
307
$date = "\l$month_txt$mday";
308
} elsif( $option =~ /d-m-y/i ) {
309
$date = "$mday-$month_txt-$fullYear";
310
} elsif( $option =~ /d m y/i ) {
311
$date = "$mday $month_txt $fullYear";
312
} elsif( $option =~ /year/i ) {
314
} elsif( $option =~ /dmy/i ) {
315
$date = $mday.'-'.$month_txt.'-'.$fullYear;
316
} elsif($option and $option !~ /hms/i) {
317
print STDERR "\n*** Unrecognized date format request: $option\n";
320
if( $option =~ /hms/i) {
321
$date .= " $hour:$min:$sec" if $date;
322
$date ||= "$hour:$min:$sec";
325
return $date || join(" ", @date);
332
Purpose : Converts a string containing a name of a month to integer
333
: representing the number of the month in the year.
334
Example : $Util->month2num("march"); # returns 3
335
Argument : The string argument must contain at least the first
336
: three characters of the month's name. Case insensitive.
337
Throws : Exception if the conversion fails.
345
my ($self, $str) = @_;
347
# Get string in proper format for conversion.
348
$str = substr($str, 0, 3);
350
return $_+1 if $str =~ /$MONTHS[$_]/i;
352
$self->throw("Invalid month name: $str");
358
Purpose : Does the opposite of month2num.
359
: Converts a number into a string containing a name of a month.
360
Example : $Util->num2month(3); # returns 'Mar'
361
Throws : Exception if supplied number is out of range.
368
my ($self, $num) = @_;
370
$self->throw("Month out of range: $num") if $num < 1 or $num > 12;
371
return $MONTHS[$num];
377
Usage : $Util->compress(filename, [tmp]);
378
Purpose : Compress a file to conserve disk space.
379
Example : $Util->compress("/usr/people/me/data.txt");
380
Returns : String (name of compressed file, full path).
381
Argument : filename = String (name of file to be compressed, full path).
382
: If the supplied filename ends with '.gz' or '.Z',
383
: that extension will be removed before attempting to compress.
385
: If true, (or if user is not the owner of the file)
386
: the file is compressed to a tmp file
387
: If false, file is clobbered with the compressed version.
388
Throws : Exception if file cannot be compressed
389
: If user is not owner of the file, generates a warning
390
: and compresses to a tmp file.
391
: To avoid this warning, use the -o file test operator
392
: and call this function with a true second argument.
393
Comments : Attempts to compress using gzip (default compression level).
394
: If that fails, will attempt to use compress.
395
: In some situations, the full path to the gzip executable
396
: may be required. This can be specified with the $GNU_PATH
397
: package global variable. When installed, $GNU_PATH is an
400
See Also : L<uncompress>()
408
my $fileName = shift;
409
my $tmp = shift || 0;
411
if($fileName =~ /(\.gz|\.Z)$/) { $fileName =~ s/$1$//; };
412
$DEBUG && print STDERR "gzipping file $fileName";
414
my ($compressed, @args);
416
if($tmp or not -o $fileName) {
418
$compressed = POSIX::tmpnam;
420
$compressed = _get_pseudo_tmpnam();
422
$compressed .= ".tmp.bioperl";
423
$compressed .= '.gz';
424
@args = ($GNU_PATH."gzip -f < $fileName > $compressed");
426
$self->warn("Not owner of file $fileName\nCompressing to tmp file $compressed.");
429
$compressed = "$fileName.gz";
430
@args = ($GNU_PATH.'gzip', '-f', $fileName);
433
if(system(@args) != 0) {
434
# gzip may not be present. Try compress.
435
$compressed = "$fileName.Z";
437
@args = ("/usr/bin/compress -f < $fileName > $compressed");
439
@args = ('/usr/bin/compress', '-f', $fileName);
441
system(@args) == 0 or
442
$self->throw("Failed to gzip/compress file $fileName: $!",
443
"Confirm current \$GNU_PATH: $GNU_PATH",
444
"Edit \$GNU_PATH in Bio::Root::Utilities.pm if necessary.");
454
Usage : $Util->uncompress(filename, [tmp]);
455
Purpose : Uncompress a file.
456
Example : $Util->uncompress("/usr/people/me/data.txt.gz");
457
Returns : String (name of uncompressed file, full path).
458
Argument : filename = String (name of file to be uncompressed, full path).
459
: If the supplied filename does not end with '.gz' or '.Z'
460
: a '.gz' will be appended before attempting to uncompress.
462
: If true, (or if user is not the owner of the file)
463
: the file is uncompressed to a tmp file
464
: If false, file is clobbered with the uncompressed version.
465
Throws : Exception if file cannot be uncompressed
466
: If user is not owner of the file, generates a warning
467
: and uncompresses to a tmp file.
468
: To avoid this warning, use the -o file test operator
469
: and call this function with a true second argument.
470
Comments : Attempts to uncompress using gunzip.
471
: If that fails, will use uncompress.
472
: In some situations, the full path to the gzip executable
473
: may be required. This can be specified with the $GNU_PATH
474
: package global variable. When installed, $GNU_PATH is an
477
See Also : L<compress>()
485
my $fileName = shift;
486
my $tmp = shift || 0;
488
if(not $fileName =~ /(\.gz|\.Z)$/) { $fileName .= '.gz'; }
489
$DEBUG && print STDERR "gunzipping file $fileName";
491
my($uncompressed, @args);
493
if($tmp or not -o $fileName) {
495
$uncompressed = POSIX::tmpnam;
497
$uncompressed = _get_pseudo_tmpnam();
499
$uncompressed .= ".tmp.bioperl";
500
@args = ($GNU_PATH."gunzip -f < $fileName > $uncompressed");
501
not $tmp and $self->verbose > 0 and
502
$self->warn("Not owner of file $fileName\nUncompressing to tmp file $uncompressed.");
505
@args = ($GNU_PATH.'gunzip', '-f', $fileName);
506
($uncompressed = $fileName) =~ s/(\.gz|\.Z)$//;
509
# $ENV{'PATH'} = '/tools/gnu/bin';
511
if(system(@args) != 0) {
512
# gunzip may not be present. Try uncompress.
513
($uncompressed = $fileName) =~ s/(\.gz|\.Z)$//;
515
@args = ("/usr/bin/uncompress -f < $fileName > $uncompressed");
517
@args = ('/usr/bin/uncompress', '-f', $fileName);
519
system(@args) == 0 or
520
$self->throw("Failed to gunzip/uncompress file $fileName: $!",
521
"Confirm current \$GNU_PATH: $GNU_PATH",
522
"Edit \$GNU_PATH in Bio::Root::Utilities.pm if necessary.");
525
return $uncompressed;
532
Usage : $Util->file_date( filename [,date_format])
533
Purpose : Obtains the date of a given file.
534
: Provides flexible formatting via date_format().
535
Returns : String = date of the file as: yyyy-mm-dd (e.g., 1997-10-15)
536
Argument : filename = string, full path name for file
537
: date_format = string, desired format for date (see date_format()).
538
: Default = yyyy-mm-dd
539
Thows : Exception if no file is provided or does not exist.
540
Comments : Uses the mtime field as obtained by stat().
547
my ($self, $file, $fmt) = @_;
549
$self->throw("No such file: $file") if not $file or not -e $file;
551
$fmt ||= 'yyyy-mm-dd';
553
my @file_data = stat($file);
554
return $self->date_format($fmt, $file_data[9]); # mtime field
561
Purpose : To remove nasty shell characters from untrusted data
562
: and allow a script to run with the -T switch.
563
: Potentially dangerous shell meta characters: &;`'\"|*?!~<>^()[]{}$\n\r
564
: Accept only the first block of contiguous characters:
565
: Default allowed chars = "-\w.', ()"
566
: If $relax is true = "-\w.', ()\/=%:^<>*"
567
Usage : $Util->untaint($value, $relax)
568
Returns : String containing the untained data.
569
Argument: $value = string
572
This general untaint() function may not be appropriate for every situation.
573
To allow only a more restricted subset of special characters
574
(for example, untainting a regular expression), then using a custom
575
untainting mechanism would permit more control.
577
Note that special trusted vars (like $0) require untainting.
584
my($self,$value,$relax) = @_;
588
$DEBUG and print STDERR "\nUNTAINT: $value\n";
590
defined $value || return;
593
$value =~ /([-\w.\', ()\/=%:^<>*]+)/;
595
# } elsif( $relax == 2 ) { # Could have several degrees of relax.
596
# $value =~ /([-\w.\', ()\/=%:^<>*]+)/;
599
$value =~ /([-\w.\', ()]+)/;
603
$DEBUG and print STDERR "UNTAINTED: $untainted\n";
612
Usage : ($mean, $stdev) = $Util->mean_stdev( @data )
613
Purpose : Calculates the mean and standard deviation given a list of numbers.
614
Returns : 2-element list (mean, stdev)
615
Argument : list of numbers (ints or floats)
623
my ($self, @data) = @_;
625
foreach (@data) { $mean += $_; }
626
$mean /= scalar @data;
627
my $sum_diff_sqd = 0;
628
foreach (@data) { $sum_diff_sqd += ($mean - $_) * ($mean - $_); }
629
my $stdev = sqrt(abs($sum_diff_sqd/(scalar @data)-1));
630
return ($mean, $stdev);
637
Purpose : Counts the number of files/directories within a given directory.
638
: Also reports the number of text and binary files in the dir
639
: as well as names of these files and directories.
640
Usage : count_files(\%data)
641
: $data{-DIR} is the directory to be analyzed. Default is ./
642
: $data{-PRINT} = 0|1; if 1, prints results to STDOUT, (default=0).
643
Argument : Hash reference (empty)
645
: Modifies the hash ref passed in as the sole argument.
646
: $$href{-TOTAL} scalar
647
: $$href{-NUM_TEXT_FILES} scalar
648
: $$href{-NUM_BINARY_FILES} scalar
649
: $$href{-NUM_DIRS} scalar
650
: $$href{-T_FILE_NAMES} array ref
651
: $$href{-B_FILE_NAMES} array ref
652
: $$href{-DIRNAMES} array ref
660
my $href = shift; # Reference to an empty hash.
661
my( $name, @fileLine);
662
my $dir = $$href{-DIR} || './';
663
my $print = $$href{-PRINT} || 0;
665
### Make sure $dir ends with /
666
$dir !~ /\/$/ and do{ $dir .= '/'; $$href{-DIR} = $dir; };
668
open ( PIPE, "ls -1 $dir |" ) || $self->throw("Can't open input pipe: $!");
670
### Initialize the hash data.
672
$$href{-NUM_TEXT_FILES} = $$href{-NUM_BINARY_FILES} = $$href{-NUM_DIRS} = 0;
673
$$href{-T_FILE_NAMES} = [];
674
$$href{-B_FILE_NAMES} = [];
675
$$href{-DIR_NAMES} = [];
680
$$href{-NUM_TEXT_FILES}++; push @{$$href{-T_FILE_NAMES}}, $_; }
681
if( -B $dir.$_ and not -d $dir.$_) {
682
$$href{-NUM_BINARY_FILES}++; push @{$$href{-B_FILE_NAMES}}, $_; }
684
$$href{-NUM_DIRS}++; push @{$$href{-DIR_NAMES}}, $_; }
689
printf( "\n%4d %s\n", $$href{-TOTAL}, "total files+dirs in $dir");
690
printf( "%4d %s\n", $$href{-NUM_TEXT_FILES}, "text files");
691
printf( "%4d %s\n", $$href{-NUM_BINARY_FILES}, "binary files");
692
printf( "%4d %s\n", $$href{-NUM_DIRS}, "directories");
700
# Purpose : Obtains a variety of date for a given file.
701
# : Provides an interface to Perl's stat().
702
# Status : Under development. Not ready. Don't use!
709
my ($self, %param) = @_;
710
my ($file, $get, $fmt) = $self->_rearrange([qw(FILE GET FMT)], %param);
712
$fmt ||= 'yyyy-mm-dd';
714
my($dev, $ino, $mode, $nlink, $uid, $gid, $rdev, $size,
715
$atime, $mtime, $ctime, $blksize, $blocks) = stat $file;
717
if($get =~ /date/i) {
718
## I can get the elapsed time since the file was modified but
719
## it's not so straightforward to get the date in a nice format...
720
## Think about using a standard CPAN module for this, like
721
## Date::Manip or Date::DateCalc.
724
my $elsec = time - $mtime;
725
printf "\nFile age: %.0f sec %.0f hrs %.0f days", $elsec, $elsec/3600, $elsec/(3600*24);<STDIN>;
726
my $days = sprintf "%.0f", $elsec/(3600*24);
727
} elsif($get eq 'all') {
737
my $fileName = shift;
738
if(not -e $fileName) {
739
$self->throw("Can't delete file $fileName: Does not exist.");
740
} elsif(not -o $fileName) {
741
$self->throw("Can't delete file $fileName: Not owner.");
743
my $ulval = unlink($fileName) > 0 or
744
$self->throw("Failed to delete file $fileName: $!");
748
=head2 create_filehandle
750
Usage : $object->create_filehandle(<named parameters>);
751
Purpose : Create a FileHandle object from a file or STDIN.
752
: Mainly used as a helper method by read() and get_newline().
753
Example : $data = $object->create_filehandle(-FILE =>'usr/people/me/data.txt')
754
Argument : Named parameters (case-insensitive):
756
: -CLIENT => object reference for the object submitting
757
: the request. This facilitates use by
758
: Bio::Root::IOManager::read(). Default = $Util.
759
: -FILE => string (full path to file) or a reference
760
: to a FileHandle object or typeglob. This is an
761
: optional parameter (if not defined, STDIN is used).
762
Returns : Reference to a FileHandle object.
763
Throws : Exception if cannot open a supplied file or if supplied with a
764
: reference that is not a FileHandle ref.
765
Comments : If given a FileHandle reference, this method simply returns it.
766
: This method assumes the user wants to read ascii data. So, if
767
: the file is binary, it will be treated as a compressed (gzipped)
768
: file and access it using gzip -ce. The problem here is that not
769
: all binary files are necessarily compressed. Therefore,
770
: this method should probably have a -mode parameter to
771
: specify ascii or binary.
773
See Also : L<get_newline>(), L<Bio::Root::IOManager::read>(),
777
#---------------------
778
sub create_filehandle {
779
#---------------------
780
my($self, @param) = @_;
781
my($client, $file, $handle) =
782
$self->_rearrange([qw( CLIENT FILE HANDLE )], @param);
784
if(not ref $client) { $client = $self; }
786
if( $client->can('file')) {
787
$file = $client->file($file);
790
my $FH; # = new FileHandle;
794
if($handle_ref = ref($file)) {
795
if($handle_ref eq 'FileHandle') {
797
$client->{'_input_type'} = "FileHandle";
798
} elsif($handle_ref eq 'GLOB') {
800
$client->{'_input_type'} = "Glob";
802
$self->throw("Can't read from $file: Not a FileHandle or GLOB ref.");
804
$self->verbose > 0 and printf STDERR "$ID: reading data from FileHandle\n";
807
$client->{'_input_type'} = "FileHandle for $file";
809
# Use gzip -cd to access compressed data.
811
$client->{'_input_type'} .= " (compressed)";
812
$file = "${GNU_PATH}gzip -cd $file |"
815
$FH = new FileHandle;
816
open ($FH, $file) || $self->throw("Can't access data file: $file",
818
$self->verbose > 0 and printf STDERR "$ID: reading data from file $file\n";
823
$self->verbose > 0 and printf STDERR "$ID: reading data from STDIN\n";
824
$client->{'_input_type'} = "STDIN";
832
Usage : $object->get_newline(<named parameters>);
833
Purpose : Determine the character(s) used for newlines in a given file or
834
: input stream. Delegates to Bio::Root::Utilities::get_newline()
835
Example : $data = $object->get_newline(-CLIENT => $anObj,
836
: -FILE =>'usr/people/me/data.txt')
837
Argument : Same arguemnts as for create_filehandle().
838
Returns : Reference to a FileHandle object.
839
Throws : Propogates and exceptions thrown by Bio::Root::Utilities::get_newline().
841
See Also : L<taste_file>(), L<create_filehandle>()
848
my($self, @param) = @_;
850
return $NEWLINE if defined $NEWLINE;
853
$self->_rearrange([qw( CLIENT )], @param);
855
my $FH = $self->create_filehandle(@param);
857
if(not ref $client) { $client = $self; }
859
if($client->{'_input_type'} =~ /STDIN|Glob|compressed/) {
860
# Can't taste from STDIN since we can't seek 0 on it.
861
# Are other non special Glob refs seek-able?
862
# Attempt to guess newline based on platform.
863
# Not robust since we could be reading Unix files on a Mac, e.g.
864
if(defined $ENV{'MACPERL'}) {
865
$NEWLINE = "\015"; # \r
867
$NEWLINE = "\012"; # \n
870
$NEWLINE = $self->taste_file($FH);
873
close ($FH) unless ($client->{'_input_type'} eq 'STDIN' ||
874
$client->{'_input_type'} eq 'FileHandle' ||
875
$client->{'_input_type'} eq 'Glob' );
877
delete $client->{'_input_type'};
879
return $NEWLINE || $DEFAULT_NEWLINE;
885
Usage : $object->taste_file( <FileHandle> );
886
: Mainly a utility method for get_newline().
887
Purpose : Sample a filehandle to determine the character(s) used for a newline.
888
Example : $char = $Util->taste_file($FH)
889
Argument : Reference to a FileHandle object.
890
Returns : String containing an octal represenation of the newline character string.
891
: Unix = "\012" ("\n")
892
: Win32 = "\012\015" ("\r\n")
893
: Mac = "\015" ("\r")
894
Throws : Exception if no input is read within $TIMEOUT_SECS seconds.
895
: Exception if argument is not FileHandle object reference.
896
: Warning if cannot determine neewline char(s).
897
Comments : Based on code submitted by Vicki Brown (vlb@deltagen.com).
899
See Also : L<get_newline>()
906
my ($self, $FH) = @_;
907
my $BUFSIZ = 256; # Number of bytes read from the file handle.
908
my ($buffer, $octal, $str, $irs, $i);
909
my $wait = $TIMEOUT_SECS;
911
ref($FH) eq 'FileHandle' or $self->throw("Can't taste file: not a FileHandle ref");
915
# this is a quick hack to check for availability of alarm(); just copied
916
# from Bio/Root/IOManager.pm HL 02/19/01
917
my $alarm_available = 1;
922
# alarm() not available (ActiveState perl for win32 doesn't have it.
923
# See jitterbug PR#98)
924
$alarm_available = 0;
926
$SIG{ALRM} = sub { die "Timed out!"; };
929
$alarm_available && alarm( $wait );
930
$result = read($FH, $buffer, $BUFSIZ); # read the $BUFSIZ characters of file
931
$alarm_available && alarm(0);
933
if($@ =~ /Timed out!/) {
934
$self->throw("Timed out while waiting for input.",
935
"Timeout period = $wait seconds.\nFor longer time before timing out, edit \$TIMEOUT_SECS in Bio::Root::Global.pm.");
937
} elsif(not $result) {
939
$self->throw("read taste failed to read from FileHandle.", $err);
941
} elsif($@ =~ /\S/) {
943
$self->throw("Unexpected error during read: $err");
946
seek($FH, 0, 0) or $self->throw("seek failed to seek 0 on FileHandle.");
948
my @chars = split(//, $buffer);
950
for ($i = 0; $i <$BUFSIZ; $i++) {
951
if (($chars[$i] eq "\012")) {
952
unless ($chars[$i-1] eq "\015") {
959
} elsif (($chars[$i] eq "\015") && ($chars[$i+1] eq "\012")) {
965
} elsif (($chars[$i] eq "\015")) {
974
$self->warn("Could not determine newline char. Using '\012'");
977
# print STDERR "NEWLINE CHAR = $irs\n";
982
######################################
983
##### Mail Functions ########
984
######################################
986
=head2 mail_authority
988
Title : mail_authority
989
Usage : $Util->mail_authority( $message )
990
Purpose : Syntactic sugar to send email to $Bio::Root::Global::AUTHORITY
992
See Also : L<send_mail>()
998
my( $self, $message ) = @_;
999
my $script = $self->untaint($0,1);
1001
send_mail( -TO=>$AUTHORITY, -SUBJ=>$script, -MSG=>$message);
1009
Usage : $Util->send_mail( named_parameters )
1010
Purpose : Provides an interface to /usr/lib/sendmail
1012
Argument : Named parameters: (case-insensitive)
1013
: -TO => e-mail address to send to
1014
: -SUBJ => subject for message (optional)
1015
: -MSG => message to be sent (optional)
1016
: -CC => cc: e-mail address (optional)
1017
Thows : Exception if TO: address appears bad or is missing
1018
Comments : Based on TomC's tip at:
1019
: http://www.perl.com/CPAN-local/doc/FMTEYEWTK/safe_shellings
1021
: Using default 'From:' information.
1022
: sendmail options used:
1023
: -t: ignore the address given on the command line and
1024
: get To:address from the e-mail header.
1025
: -oi: prevents send_mail from ending the message if it
1026
: finds a period at the start of a line.
1028
See Also : L<mail_authority>()
1036
my( $self, @param) = @_;
1037
my($recipient,$subj,$message,$cc) = $self->_rearrange([qw(TO SUBJ MSG CC)],@param);
1039
$self->throw("Invalid or missing e-mail address: $recipient")
1040
if not $recipient =~ /\S+\@\S+/;
1042
$cc ||= ''; $subj ||= ''; $message ||= '';
1044
open (SENDMAIL, "|/usr/lib/sendmail -oi -t") ||
1045
$self->throw("Can't send mail: sendmail cannot fork: $!");
1047
print SENDMAIL <<QQ_EOF_QQ;
1057
if ($?) { warn "sendmail didn't exit nicely: $?" }
1061
######################################
1062
### Interactive Functions #####
1063
######################################
1069
Usage : $Util->yes_reply( [query_string]);
1070
Purpose : To test an STDIN input value for affirmation.
1071
Example : print +( $Util->yes_reply('Are you ok') ? "great!\n" : "sorry.\n" );
1072
: $Util->yes_reply('Continue') || die;
1073
Returns : Boolean, true (1) if input string begins with 'y' or 'Y'
1074
Argument: query_string = string to be used to prompt user (optional)
1075
: If not provided, 'Yes or no' will be used.
1076
: Question mark is automatically appended.
1086
$query ||= 'Yes or no';
1087
print "\n$query? (y/n) [n] ";
1088
chomp( $reply = <STDIN> );
1096
Title : request_data()
1097
Usage : $Util->request_data( [value_name]);
1098
Purpose : To request data from a user to be entered via keyboard (STDIN).
1099
Example : $name = $Util->request_data('Name');
1100
: # User will see: % Enter Name:
1101
Returns : String, (data entered from keyboard, sans terminal newline.)
1102
Argument: value_name = string to be used to prompt user.
1103
: If not provided, 'data' will be used, (not very helpful).
1104
: Question mark is automatically appended.
1112
my $data = shift || 'data';
1113
print "Enter $data: ";
1114
# Remove the terminal newline char.
1115
chomp($data = <STDIN>);
1120
# Not much used since you can use request_data()
1121
# and test for an empty string.
1124
chop( $reply = <STDIN> );
1129
=head2 verify_version
1131
Purpose : Checks the version of Perl used to invoke the script.
1132
: Aborts program if version is less than the given argument.
1133
Usage : verify_version('5.000')
1138
sub verify_version {
1141
my $reqVersion = shift;
1143
$] < $reqVersion and do {
1144
printf STDERR ( "\a\n%s %0.3f.\n", "** Sorry. This Perl script requires at least version", $reqVersion);
1145
printf STDERR ( "%s %0.3f %s\n\n", "You are running Perl version", $], "Please update your Perl!\n\n" );
1150
# Purpose : Returns a string that can be used as a temporary file name.
1151
# Based on localtime.
1152
# This is used if POSIX is not available.
1154
sub _get_pseudo_tmpnam {
1156
my $date = localtime(time());
1158
my $tmpnam = 'tmpnam';
1160
if( $date =~ /([\d:]+)\s+(\d+)\s*$/ ) {
1161
$tmpnam = $2. '_' . $1;
1172
---------------------
1175
* Using global $TIMEOUT_SECS in taste_file().
1178
* Renamed get_newline_char() to get_newline() since it could be >1 char.
1181
* Added three new methods: create_filehandle, get_newline_char, taste_file.
1182
create_filehandle represents functionality that was formerly buried
1183
within Bio::Root::IOManager::read().
1186
* Removed autoloading code.
1187
* Modified compress(), uncompress(), and delete() to properly
1188
deal with file ownership issues.
1191
* Improved file_date() to be less reliant on the output of ls.
1192
(Note the word 'less'; it still relies on ls).
1195
* compress() & uncompress() will write files to a temporary location
1196
if the first attempt to compress/uncompress fails.
1197
This allows users to access compressed files in directories in which they
1198
lack write permission.