2
2
PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd">
4
4
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
5
<title>PyTables User's Guide</title><meta http-equiv="Content-Type" content="text/html"><link rel="schema.DC" href="http://purl.org/dc/elements/1.1/"><meta name="generator" content="The tbook system at http://tbookdtd.sourceforge.net"><meta name="robots" content="index"><meta name="DC.Title" content="PyTables User's Guide"><meta name="DC.Description" content="PyTables User's Guide Hierarchical datasets in Python Release 0.9.1"><meta name="DC.Creator" content="Francesc Altet"><meta name="Author" content="Francesc Altet"><meta name="DC.Creator" content="Scott Prater"><meta name="Author" content="Scott Prater"><meta name="DC.Creator" content="Ivan Vilata"><meta name="Author" content="Ivan Vilata"><meta name="DC.Creator" content="Tom Hedley"><meta name="Author" content="Tom Hedley"><meta name="DC.Date" content="2004-12-02T12:15:05+01:00"><meta name="Date" content="2004-12-02T12:15:05+01:00"><meta name="DC.Rights" content="(c) 2002, 2003, 2004 Francesc Altet"><meta name="Copyright" content="(c) 2002, 2003, 2004 Francesc Altet"><meta name="DC.Type" content="Text"><meta name="DC.Format" content="text/html"><meta name="DC.Language" scheme="rfc3066" content="en"><meta name="Language" content="en"><meta http-equiv="Content-Style-Type" content="text/css"><style type="text/css">
5
<title>PyTables User's Guide</title><meta http-equiv="Content-Type" content="text/html"><link rel="schema.DC" href="http://purl.org/dc/elements/1.1/"><meta name="generator" content="The tbook system at http://tbookdtd.sourceforge.net"><meta name="robots" content="index"><meta name="DC.Title" content="PyTables User's Guide"><meta name="DC.Description" content="PyTables User's Guide Hierarchical datasets in Python Release 1.1.1"><meta name="DC.Creator" content="Francesc Altet"><meta name="Author" content="Francesc Altet"><meta name="DC.Creator" content="Scott Prater"><meta name="Author" content="Scott Prater"><meta name="DC.Creator" content="Ivan Vilata"><meta name="Author" content="Ivan Vilata"><meta name="DC.Creator" content="Vicent Mas"><meta name="Author" content="Vicent Mas"><meta name="DC.Creator" content="Tom Hedley"><meta name="Author" content="Tom Hedley"><meta name="DC.Creator" content="Antonio Valentino"><meta name="Author" content="Antonio Valentino"><meta name="DC.Date" content="2005-09-13T14:21:55+02:00"><meta name="Date" content="2005-09-13T14:21:55+02:00"><meta name="DC.Rights" content="(c) 2002, 2003, 2004, 2005 Francesc AltetCopyright Notice and Statement for PyTables Software Library and Utilities Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met: 1. Redistributions of source code must retain the above copyright notice, this lis"><meta name="Copyright" content="(c) 2002, 2003, 2004, 2005 Francesc AltetCopyright Notice and Statement for PyTables Software Library and Utilities Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met: 1. Redistributions of source code must retain the above copyright notice, this lis"><meta name="DC.Type" content="Text"><meta name="DC.Format" content="text/html"><meta name="DC.Language" scheme="rfc3066" content="en"><meta name="Language" content="en"><meta http-equiv="Content-Style-Type" content="text/css"><style type="text/css">
7
7
font-family: serif; text-align: justify;
8
8
margin: 0pt; line-height: 1.3; background-color: white; color: black }
146
146
div.subtitle { font-size: x-large; color: olive }
147
147
div.title-article { font-size: x-large }
148
148
div.partheadline { font-size: x-large }
149
</style></head><body><div class="speedbar-top"><table class="speedbar"><tbody><tr><td style="text-align: left; width: 15%"><a href="usersguide5.html">previous</a></td><td style="text-align: center"><a href="usersguide.html#tb:table-of-contents">Table of Contents</a></td><td style="text-align: right; width: 15%"><a href="usersguide7.html">next</a></td></tr><tr><td colspan="3"> </td></tr></tbody></table><hr class="speedbar"></div><div class="document"><div id="optimizationTips"><a name="optimizationTips"></a>
149
</style></head><body><div class="speedbar-top"><table class="speedbar"><tbody><tr><td style="text-align: left; width: 15%"><a href="usersguide5.html">previous</a></td><td style="text-align: center"><a href="usersguide.html#tb:table-of-contents">Table of Contents</a> — <a href="usersguide11.html">References</a></td><td style="text-align: right; width: 15%"><a href="usersguide7.html">next</a></td></tr><tr><td colspan="3"> </td></tr></tbody></table><hr class="speedbar"></div><div class="document"><div id="optimizationTips"><a name="optimizationTips"></a>
150
150
<h1 id="chapter6"><a name="chapter6"></a>Chapter 6: Optimization tips</h1>
152
152
<div class="aphorism">... durch planmässiges
233
233
<div class="p-first"><tt class="verb">PyTables</tt> provides a way to accelerate data
234
234
selections when they are simple, i.e. only a column is
235
235
implied in the selection process, through the use of the
236
<tt class="verb">where</tt> iterator (see <a href="usersguide4.html#whereTableDescr">4.5.2</a>). We will call this mode of
236
<tt class="verb">where</tt> iterator (see <a href="usersguide4.html#Table.where">4.6.2</a>). We will call this mode of
237
237
selecting data as <em>in-kernel</em>. Let's see an example
238
238
of <em>in-kernel</em> selection based on the
239
239
<em>standard</em> selection mentioned above:
251
251
<div class="figure" id="searchTimes-int"><a name="searchTimes-int"></a>
252
<img class="graphics" alt="Times for different selection modes over Int32 values. Benchmark made ..." src="searchTimes-int-itanium-web.png">
253
<div class="caption" style="width: 200px"><div class="caption-text"><span class="captionlabel">Figure 6.1:</span> Times for different selection modes over <tt>Int32</tt> values. Benchmark made on a
252
<img class="graphics" width="375" height="262" alt="Times for different selection modes over Int32 values. Benchmark made ..." src="searchTimes-int-itanium-web.png">
253
<div class="caption" style="width: 375px"><div class="caption-text"><span class="captionlabel">Figure 6.1:</span> Times for different selection modes over <tt>Int32</tt> values. Benchmark made on a
254
254
machine with Itanium (IA64) @ 900 MHz processors with
255
255
SCSI disk @ 10K RPM.
259
259
<div class="figure" id="searchTimes-float"><a name="searchTimes-float"></a>
260
<img class="graphics" alt="Times for different selection modes over Float64 values. Benchmark mad..." src="searchTimes-float-itanium-web.png">
261
<div class="caption" style="width: 200px"><div class="caption-text"><span class="captionlabel">Figure 6.2:</span> Times for different selection modes over <tt>Float64</tt> values. Benchmark made on
260
<img class="graphics" width="375" height="262" alt="Times for different selection modes over Float64 values. Benchmark mad..." src="searchTimes-float-itanium-web.png">
261
<div class="caption" style="width: 375px"><div class="caption-text"><span class="captionlabel">Figure 6.2:</span> Times for different selection modes over <tt>Float64</tt> values. Benchmark made on
262
262
a machine with Itanium (IA64) @ 900 MHz processors
263
263
with SCSI disk @ 10K RPM.
282
282
<div class="p">You should note, however, that currently the
283
283
<tt class="verb">where</tt> method only accepts conditions along a
284
single column<a href="#footnote8" id="footnoteback8"><sup title="Although this may change in the future">8)</sup></a>. Fortunately, you can mix the
284
single column<a href="#footnote10" id="footnoteback10"><sup title="Although this may change in the future">10)</sup></a>. Fortunately, you can mix the
285
285
<em>in-kernel</em> and <em>standard</em> selection modes
286
286
for evaluating arbitrarily complex conditions along several
287
287
columns at once. Look at this example:
403
403
<em>linearly</em>. In particular, the time to index a
404
404
couple of columns with 1 billion of rows each is 40
405
405
min. (roughly 20 min. each), which is a quite reasonable
406
figure. This is because <tt class="verb">PyTables</tt> has choosed
406
figure. This is because <tt class="verb">PyTables</tt> has chosen
407
407
an algorithm that do a <em>partial</em> sorting of the
408
408
columns in order to ensure that the indexing time grows
409
409
<em>linearly</em>. On the contrary, most of relational
428
428
<div class="figure" id="indexTimes"><a name="indexTimes"></a>
429
<img class="graphics" alt="Times for indexing a couple of columns of
 datatypes Int32 and
 Floa..." src="indexTimes-itanium-web.png">
430
<div class="caption" style="width: 200px"><div class="caption-text"><span class="captionlabel">Figure 6.3:</span> Times for indexing a couple of columns of
431
datatypes <tt>Int32</tt> and
429
<img class="graphics" width="375" height="262" alt="Times for indexing a couple of columns of
 data type Int32 and
 Floa..." src="indexTimes-itanium-web.png">
430
<div class="caption" style="width: 375px"><div class="caption-text"><span class="captionlabel">Figure 6.3:</span> Times for indexing a couple of columns of
431
data type <tt>Int32</tt> and
432
432
<tt>Float64</tt>. Benchmark made
433
433
on a machine with Itanium (IA64) @ 900 MHz processors
434
434
with SCSI disk @ 10K RPM.
442
442
<h2 id="section6.3"><span class="headlinenumber"><a name="section6.3"></a>6.3 </span>Compression issues</h2>
444
444
<p class="first">One of the beauties of <tt class="verb">PyTables</tt> is that it
445
supports compression on tables and arrays<a href="#footnote9" id="footnoteback9"><sup title="More precisely, it is supported in EArray and VLArray objects, not in Array objects itself.">9)</sup></a>, although it is disabled by
446
default. Compression of big amounts of data might be a bit
447
controversial feature, because compression has a legend of
448
being a very big CPU time resources consumer. However, if
449
you are willing to check if compression can help not only
450
reducing your dataset file size but <b>also</b> improving your I/O efficiency,
445
supports compression on tables and arrays<a href="#footnote11" id="footnoteback11"><sup title="More precisely, it is supported in CArray, EArray and VLArray objects, not in Array objects itself.">11)</sup></a>, although it
446
is disabled by default. Compression of big amounts of data
447
might be a bit controversial feature, because compression
448
has a legend of being a very big CPU time resources
449
consumer. However, if you are willing to check if
450
compression can help not only reducing your dataset file
451
size but <b>also</b> improving your
452
I/O efficiency, keep reading.
454
455
<p>There is an usual scenario where users need to save
475
476
scenarios compression use is convenient or not).
478
<p>The compression library used by default is the <b>Zlib</b> (see <a href="#zlibRef"></a>), and as HDF5 <em>requires</em>
479
<p>The compression library used by default is the <b>Zlib</b> (see <a href="usersguide11.html#zlibRef"></a>), and as HDF5 <em>requires</em>
479
480
it, you can safely use it and expect that your HDF5 files
480
481
will be readable on any other platform that has HDF5 libraries
481
482
installed. Zlib provides good compression ratio, although
490
491
compression or more CPU wasted on compression, as we will
491
492
see soon). This is why support for two additional
492
493
compressors has been added to PyTables: LZO and UCL (see
493
<a href="#lzouclRef"></a>). Following his author (and
494
<a href="usersguide11.html#lzouclRef"></a>). Following his author (and
494
495
checked by the author of this manual), LZO offers pretty
495
496
fast compression (although small compression ratio) and
496
497
extremely fast decompression while UCL achieves an excellent
505
506
large amounts of data.
508
<p>Be aware that the LZO and UCL support in PyTables is not
509
<p>Be aware that the LZO, UCL and bzip2 support in PyTables is not
509
510
standard on HDF5, so if you are going to use your PyTables
510
511
files in other contexts different from PyTables you will not
511
be able to read them. Still, see the <a href="usersguide8.html#ptrepackDescr">appendix B.2</a> where the
512
be able to read them. Still, see the <a href="usersguide9.html#ptrepackDescr">appendix C.2</a> where the
512
513
<tt class="verb">ptrepack</tt> utility is described to find a way to
513
free your files from LZO or UCL dependencies, so that you
514
free your files from LZO, UCL or bzip2 dependencies, so that you
514
515
can use these compressors locally with the warranty that you
515
516
can replace them with ZLIB (or even remove compression
516
517
completely) if you want to export the files to other HDF5
560
561
<div class="figure" id="lzozlibuclWriteComparison"><a name="lzozlibuclWriteComparison"></a>
561
<img class="graphics" alt="Writing tables with several compressors.
 " src="write-medium-lzo-zlib-ucl-comparison-web.png">
562
<div class="caption" style="width: 200px"><div class="caption-text"><span class="captionlabel">Figure 6.4:</span> Writing tables with several compressors.
562
<img class="graphics" width="375" height="262" alt="Writing tables with several compressors.
 " src="write-medium-lzo-zlib-ucl-comparison-web.png">
563
<div class="caption" style="width: 375px"><div class="caption-text"><span class="captionlabel">Figure 6.4:</span> Writing tables with several compressors.
566
567
<div class="figure" id="lzozlibuclReadComparison"><a name="lzozlibuclReadComparison"></a>
567
<img class="graphics" alt="Reading tables with several compressors.
 " src="read-medium-lzo-zlib-ucl-comparison-web.png">
568
<div class="caption" style="width: 200px"><div class="caption-text"><span class="captionlabel">Figure 6.5:</span> Reading tables with several compressors.
568
<img class="graphics" width="375" height="262" alt="Reading tables with several compressors.
 " src="read-medium-lzo-zlib-ucl-comparison-web.png">
569
<div class="caption" style="width: 375px"><div class="caption-text"><span class="captionlabel">Figure 6.5:</span> Reading tables with several compressors.
572
573
<div class="figure" id="psycolzozlibuclWriteComparison"><a name="psycolzozlibuclWriteComparison"></a>
573
<img class="graphics" alt="Writing tables with several compressors and Psyco.
 " src="write-medium-psyco-lzo-zlib-ucl-comparison-web.png">
574
<div class="caption" style="width: 200px"><div class="caption-text"><span class="captionlabel">Figure 6.6:</span> Writing tables with several compressors and Psyco.
574
<img class="graphics" width="375" height="262" alt="Writing tables with several compressors and Psyco.
 " src="write-medium-psyco-lzo-zlib-ucl-comparison-web.png">
575
<div class="caption" style="width: 375px"><div class="caption-text"><span class="captionlabel">Figure 6.6:</span> Writing tables with several compressors and Psyco.
578
579
<div class="figure" id="psycolzozlibuclReadComparison"><a name="psycolzozlibuclReadComparison"></a>
579
<img class="graphics" alt="Reading tables with several compressors and Psyco.
 " src="read-medium-psyco-lzo-zlib-ucl-comparison-web.png">
580
<div class="caption" style="width: 200px"><div class="caption-text"><span class="captionlabel">Figure 6.7:</span> Reading tables with several compressors and Psyco.
580
<img class="graphics" width="375" height="262" alt="Reading tables with several compressors and Psyco.
 " src="read-medium-psyco-lzo-zlib-ucl-comparison-web.png">
581
<div class="caption" style="width: 375px"><div class="caption-text"><span class="captionlabel">Figure 6.7:</span> Reading tables with several compressors and Psyco.
585
586
By looking at graphs, you can expect that, generally
586
587
speaking, LZO would be the fastest both compressing and
587
588
uncompressing, but the one that achieves the worse
588
compression ratio (although that may be just ok for many
589
compression ratio (although that may be just OK for many
589
590
situations). UCL is the slowest when compressing, but is
590
591
faster than Zlib when decompressing, and, besides, it
591
592
achieves very good compression ratios (generally better than
616
617
<p> You can select the compression library and level by
617
618
setting the <tt class="verb">complib</tt> and <tt class="verb">compress</tt>
618
keywords in the <tt class="verb">Filters</tt> class (see <a href="usersguide4.html#FiltersClassDescr">4.13.1</a>). A compression level of 0
619
keywords in the <tt class="verb">Filters</tt> class (see <a href="usersguide4.html#FiltersClassDescr">4.17.1</a>). A compression level of 0
619
620
will completely disable compression (the default), 1 is the
620
621
less CPU time demanding level, while 9 is the maximum level
621
622
and most CPU intensive. Finally, have in mind that LZO is
633
634
<p class="first">The <tt class="verb">HDF5</tt> library provides an interesting
634
635
filter that can leverage the results of your favorite
635
636
compressor. Its name is <em>shuffle</em>, and because it can
636
greatly benefit compression and it doesn't take many CPU
637
greatly benefit compression and it does not take many CPU
637
638
resources, it is active by <em>default</em> in
638
639
<tt class="verb">PyTables</tt> whenever compression is activated
639
640
(independently of the chosen compressor). It is of course
710
711
<h2 id="section6.5"><span class="headlinenumber"><a name="section6.5"></a>6.5 </span>Taking advantage of Psyco</h2>
712
<p class="first">Psyco (see <a href="#psycoRef"></a>) is a kind of
713
<p class="first">Psyco (see <a href="usersguide11.html#psycoRef"></a>) is a kind of
713
714
specialized compiler for Python that typically accelerates
714
715
Python applications with no change in source code. You can
715
716
think of Psyco as a kind of just-in-time (JIT) compiler, a
778
779
<div class="figure" id="psycoWriteComparison"><a name="psycoWriteComparison"></a>
779
<img class="graphics" alt="Writing tables with/without Psyco.
 " src="write-medium-psyco-nopsyco-comparison-web.png">
780
<div class="caption" style="width: 200px"><div class="caption-text"><span class="captionlabel">Figure 6.8:</span> Writing tables with/without Psyco.
780
<img class="graphics" width="375" height="262" alt="Writing tables with/without Psyco.
 " src="write-medium-psyco-nopsyco-comparison-web.png">
781
<div class="caption" style="width: 375px"><div class="caption-text"><span class="captionlabel">Figure 6.8:</span> Writing tables with/without Psyco.
784
785
<div class="figure" id="psycoReadComparison"><a name="psycoReadComparison"></a>
785
<img class="graphics" alt="Reading tables with/without Psyco.
 " src="read-medium-psyco-nopsyco-comparison-web.png">
786
<div class="caption" style="width: 200px"><div class="caption-text"><span class="captionlabel">Figure 6.9:</span> Reading tables with/without Psyco.
786
<img class="graphics" width="375" height="262" alt="Reading tables with/without Psyco.
 " src="read-medium-psyco-nopsyco-comparison-web.png">
787
<div class="caption" style="width: 375px"><div class="caption-text"><span class="captionlabel">Figure 6.9:</span> Reading tables with/without Psyco.
827
828
<div class="figure" id="rootUEPfig1"><a name="rootUEPfig1"></a>
828
<img class="graphics" alt="Complete tree in file test.h5, and subtree of interest for
 the us..." src="rootUEP1-web.png">
829
<div class="caption" style="width: 200px"><div class="caption-text"><span class="captionlabel">Figure 6.10:</span> Complete tree in file <tt>test.h5</tt>, and subtree of interest for
829
<img class="graphics" width="518" height="288" alt="Complete tree in file test.h5, and subtree of interest for
 the us..." src="rootUEP1-web.png">
830
<div class="caption" style="width: 518px"><div class="caption-text"><span class="captionlabel">Figure 6.10:</span> Complete tree in file <tt>test.h5</tt>, and subtree of interest for
834
835
<div class="figure" id="rootUEPfig2"><a name="rootUEPfig2"></a>
835
<img class="graphics" alt="Resulting object tree derived from the use of the
 rootUEP paramet..." src="rootUEP2-web.png">
836
<div class="caption" style="width: 200px"><div class="caption-text"><span class="captionlabel">Figure 6.11:</span> Resulting object tree derived from the use of the
836
<img class="graphics" width="304" height="94" alt="Resulting object tree derived from the use of the
 rootUEP paramet..." src="rootUEP2-web.png">
837
<div class="caption" style="width: 304px"><div class="caption-text"><span class="captionlabel">Figure 6.11:</span> Resulting object tree derived from the use of the
837
838
<tt>rootUEP</tt> parameter.
848
849
<p class="first">Let's suppose that you have a file on which you have made a
849
850
lot of row deletions on one or more tables, or deleted many
850
leaves or even entire subtrees. These operations migth leave
851
leaves or even entire subtrees. These operations might leave
851
852
<em>holes</em> (i.e. space that is not used anymore) in your
852
853
files, that may potentially affect not only the size of the
853
854
files but, more importantly, the performance of I/O. This is
855
856
is not automatically recovered on-the-flight. In addition,
856
857
if you add many more rows to a table than specified in the
857
858
<tt class="verb">expectedrows</tt> keyword in creation time this may
858
affect performace as well, as explained in <a href="#expectedRowsOptim">section 6.1</a>.
859
affect performance as well, as explained in <a href="#expectedRowsOptim">section 6.1</a>.
861
862
<p>In order to cope with these issues, you should be aware
864
865
compact your already existing <em>leaky</em> files, but also
865
866
to adjust some internal parameters (both in memory and in
866
867
file) in order to create adequate buffer sizes and chunk
867
sizes for optimum I/O speed. Please, check the <a href="usersguide8.html#ptrepackDescr">appendix B.2</a> for a brief tutorial on
868
sizes for optimum I/O speed. Please, check the <a href="usersguide9.html#ptrepackDescr">appendix C.2</a> for a brief tutorial on
884
</div><hr class="footnoterule"><div class="footnote"><a id="footnote8" href="#footnoteback8"><sup>8)</sup></a> Although this may change in the
885
future</div><div class="footnote"><a id="footnote9" href="#footnoteback9"><sup>9)</sup></a> More
886
precisely, it is supported in <tt class="verb">EArray</tt> and
887
<tt class="verb">VLArray</tt> objects, not in <tt class="verb">Array</tt>
888
objects itself.</div></div><div class="speedbar-bottom"><hr class="speedbar"><table class="speedbar"><tbody><tr><td style="text-align: left; width: 15%"><a href="usersguide5.html">previous</a></td><td style="text-align: center"><a href="usersguide.html#tb:table-of-contents">Table of Contents</a></td><td style="text-align: right; width: 15%"><a href="usersguide7.html">next</a></td></tr></tbody></table></div></body></html>
b'\\ No newline at end of file'
885
</div><hr class="footnoterule"><div class="footnote"><a id="footnote10" href="#footnoteback10"><sup>10)</sup></a> Although this may change in the
886
future</div><div class="footnote"><a id="footnote11" href="#footnoteback11"><sup>11)</sup></a> More
887
precisely, it is supported in <tt class="verb">CArray</tt>,
888
<tt class="verb">EArray</tt> and <tt class="verb">VLArray</tt> objects, not in
889
<tt class="verb">Array</tt> objects itself.</div></div><div class="speedbar-bottom"><hr class="speedbar"><table class="speedbar"><tbody><tr><td style="text-align: left; width: 15%"><a href="usersguide5.html">previous</a></td><td style="text-align: center"><a href="usersguide.html#tb:table-of-contents">Table of Contents</a> — <a href="usersguide11.html">References</a></td><td style="text-align: right; width: 15%"><a href="usersguide7.html">next</a></td></tr></tbody></table></div></body></html>
b'\\ No newline at end of file'