1
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN""http://www.w3.org/TR/html4/loose.dtd">
8
CONTENT="Modular DocBook HTML Stylesheet Version 1.79"><LINK
10
TITLE="PyTables User's Guide"
11
HREF="index.html"><LINK
13
TITLE="The PyTables Core
17
TITLE="Binary installation (Windows)"
18
HREF="x457.html"><LINK
20
TITLE="Browsing the object tree
22
HREF="x762.html"></HEAD
33
SUMMARY="Header navigation table"
45
> User's Guide: Hierarchical datasets in Python - Release 1.3.2</TH
82
>Chapter 3. Tutorials</H1
103
>Ser�s la clau que obre tots els panys,
104
ser�s la llum, la llum il.limitada, ser�s conf� on
105
l'aurora comen�a, ser�s forment, escala il.luminada!</I
121
>—M'aclame a tu Lyrics: Vicent
122
Andr�s i Estell�s Music: Ovidi Montllor</SPAN
128
>This chapter consists of a series of simple yet comprehensive
129
tutorials that will enable you to understand
131
CLASS="computeroutput"
133
>' main features. If you would like more
134
information about some particular instance variable, global
135
function, or method, look at the doc strings or go to the
136
library reference in <A
137
HREF="c1381.html#libraryReference"
139
>. If you are reading
140
this in PDF or HTML formats, follow the corresponding
141
hyperlink near each newly introduced entity.
144
>Please, note that throughout this document the terms
158
interchangeably, as will the terms <SPAN
179
>3.1. Getting started</A
182
>In this section, we will see how to define our own records
183
in Python and save collections of them (i.e. a <SPAN
189
>) into a file. Then we will select
190
some of the data in the table using Python cuts and create
192
CLASS="computeroutput"
194
> arrays to store this selection as
195
separate objects in a tree.
202
>examples/tutorial1-1.py</I
205
working version of all the code in this
206
section. Nonetheless, this tutorial series has been written
207
to allow you reproduce it in a Python interactive console. I
208
encourage you to do parallel testing and inspect the created
209
objects (variables, docs, children objects, etc.) during the
210
course of the tutorial!
217
NAME="subsection3.1.1"
218
>3.1.1. Importing <SPAN
225
>Before starting you need to import the
226
public objects in the <SAMP
227
CLASS="computeroutput"
230
normally do that by executing:
234
> >>> import tables
237
>This is the recommended way to import <SAMP
238
CLASS="computeroutput"
241
if you don't want to pollute your namespace. However,
243
CLASS="computeroutput"
245
> has a very reduced set of
246
first-level primitives, so you may consider using the
251
> >>> from tables import *
254
>which will export in your caller application namespace the
255
following functions: <SAMP
256
CLASS="computeroutput"
259
CLASS="computeroutput"
263
CLASS="computeroutput"
266
CLASS="computeroutput"
267
>isPyTablesFile()</SAMP
270
CLASS="computeroutput"
271
>whichLibVersion()</SAMP
272
>. This is a rather reduced set
273
of functions, and for convenience, we will use this
274
technique to access them.
277
>If you are going to work with <SAMP
278
CLASS="computeroutput"
282
CLASS="computeroutput"
285
CLASS="computeroutput"
288
normally, you will) you will also need to import functions
289
from them. So most <SAMP
290
CLASS="computeroutput"
297
> >>> import tables # but in this tutorial we use "from tables import *"
298
>>> import numarray # or "import numpy" or "import Numeric"
306
NAME="subsection3.1.2"
307
>3.1.2. Declaring a Column Descriptor</A
310
>Now, imagine that we have a particle detector and we want
311
to create a table object in order to save data
312
retrieved from it. You need first to define the table, the
313
number of columns it has, what kind of object is contained
314
in each column, and so on.
317
>Our particle detector has a TDC (Time to Digital
318
Converter) counter with a dynamic range of 8 bits and an
319
ADC (Analogical to Digital Converter) with a range of 16
320
bits. For these values, we will define 2 fields in our
321
record object called <SAMP
322
CLASS="computeroutput"
326
CLASS="computeroutput"
328
>. We also want to save the grid
329
position in which the particle has been detected, so we
330
will add two new fields called <SAMP
331
CLASS="computeroutput"
335
CLASS="computeroutput"
337
>. Our instrumentation also can obtain
338
the pressure and energy of the particle. The resolution of
339
the pressure-gauge allows us to use a simple-precision
341
CLASS="computeroutput"
343
> readings, while the
345
CLASS="computeroutput"
347
> value will need a double-precision
348
float. Finally, to track the particle we want to assign it
349
a name to identify the kind of the particle it is and a
350
unique numeric identifier. So we will add two more fields:
352
CLASS="computeroutput"
354
> will be a string of up to 16 characters,
356
CLASS="computeroutput"
358
> will be an integer of 64 bits
359
(to allow us to store records for extremely large numbers
363
>Having determined our columns and their types, we can now
365
CLASS="computeroutput"
368
contain all this information:
372
> >>> class Particle(IsDescription):
373
... name = StringCol(16) # 16-character String
374
... idnumber = Int64Col() # Signed 64-bit integer
375
... ADCcount = UInt16Col() # Unsigned short integer
376
... TDCcount = UInt8Col() # unsigned byte
377
... grid_i = Int32Col() # integer
378
... grid_j = IntCol() # integer (equivalent to Int32Col)
379
... pressure = Float32Col() # float (single-precision)
380
... energy = FloatCol() # double (double-precision)
385
>This definition class is self-explanatory. Basically,
386
you declare a class variable for each field you need. As
387
its value you assign an instance of the appropriate
389
CLASS="computeroutput"
391
> subclass, according to the kind of column
392
defined (the data type, the length, the shape, etc). See
394
HREF="x4389.html#ColClassDescr"
397
complete description of these subclasses. See also <A
398
HREF="a6585.html#datatypesSupported"
401
data types supported by the <SAMP
402
CLASS="computeroutput"
407
>From now on, we can use <SAMP
408
CLASS="computeroutput"
411
as a descriptor for our detector data table. We will see
412
later on how to pass this object to construct the table.
413
But first, we must create a file where all the actual data
414
pushed into our table will be saved.
422
NAME="subsection3.1.3"
423
>3.1.3. Creating a <SPAN
426
> file from scratch</A
429
>Use the first-level <SAMP
430
CLASS="computeroutput"
433
HREF="c1381.html#openFileDescr"
435
>) function to create a
437
CLASS="computeroutput"
443
> >>> h5file = openFile("tutorial1.h5", mode = "w", title = "Test file")
447
CLASS="computeroutput"
450
HREF="c1381.html#openFileDescr"
452
>) is one of the objects
453
imported by the "<SAMP
454
CLASS="computeroutput"
455
>from tables import *</SAMP
457
statement. Here, we are saying that we want to create a
458
new file in the current working directory called
460
CLASS="computeroutput"
463
CLASS="computeroutput"
466
and with an descriptive title string ("<SAMP
467
CLASS="computeroutput"
470
>"). This function attempts to open the file,
471
and if successful, returns the <SAMP
472
CLASS="computeroutput"
475
HREF="x1533.html#FileClassDescr"
479
CLASS="computeroutput"
481
>. The root of the object tree is
482
specified in the instance's <SAMP
483
CLASS="computeroutput"
493
NAME="subsection3.1.4"
494
>3.1.4. Creating a new group</A
497
>Now, to better organize our data, we will create a group
504
> that branches from the root
505
node. We will save our particle data table in this group.
509
> >>> group = h5file.createGroup("/", 'detector', 'Detector information')
512
>Here, we have taken the <SAMP
513
CLASS="computeroutput"
517
CLASS="computeroutput"
521
CLASS="computeroutput"
524
HREF="x1533.html#createGroupDescr"
526
>) to create a new group
533
> branching from "<SPAN
540
(another way to refer to the <SAMP
541
CLASS="computeroutput"
544
object we mentioned above). This will create a new
546
CLASS="computeroutput"
549
HREF="x2546.html#GroupClassDescr"
551
>) object instance that will
552
be assigned to the variable <SAMP
553
CLASS="computeroutput"
563
NAME="subsection3.1.5"
564
>3.1.5. Creating a new table</A
567
>Let's now create a <SAMP
568
CLASS="computeroutput"
571
HREF="x2981.html#TableClassDescr"
573
>) object as a branch off the
574
newly-created group. We do that by calling the
576
CLASS="computeroutput"
579
HREF="x1533.html#createTableDescr"
583
CLASS="computeroutput"
589
> >>> table = h5file.createTable(group, 'readout', Particle, "Readout example")
593
CLASS="computeroutput"
597
CLASS="computeroutput"
599
>. We assign this table the node name
607
CLASS="computeroutput"
610
declared before is the <SPAN
617
define the columns of the table) and finally we set
625
CLASS="computeroutput"
628
title. With all this information, a new <SAMP
629
CLASS="computeroutput"
632
instance is created and assigned to the variable
642
>If you are curious about how the object tree looks right
644
CLASS="computeroutput"
647
CLASS="computeroutput"
650
instance variable <SPAN
656
>, and examine the output:
660
> >>> print h5file
661
Filename: 'tutorial1.h5' Title: 'Test file' Last modif.: 'Sun Jul 27 14:00:13 2003'
662
/ (Group) 'Test file'
663
/detector (Group) 'Detector information'
664
/detector/readout (Table(0,)) 'Readout example'
668
>As you can see, a dump of the object tree is displayed.
669
It's easy to see the <SAMP
670
CLASS="computeroutput"
674
CLASS="computeroutput"
676
> objects we have just created. If you
677
want more information, just type the variable containing the
679
CLASS="computeroutput"
685
> >>> h5file
686
File(filename='tutorial1.h5', title='Test file', mode='w', trMap={}, rootUEP='/')
687
/ (Group) 'Test file'
688
/detector (Group) 'Detector information'
689
/detector/readout (Table(0,)) 'Readout example'
691
"ADCcount": Col('UInt16', shape=1, itemsize=2, dflt=0),
692
"TDCcount": Col('UInt8', shape=1, itemsize= 1, dflt=0),
693
"energy": Col('Float64', shape=1, itemsize=8, dflt=0.0),
694
"grid_i": Col('Int32', shape=1, itemsize=4, dflt=0),
695
"grid_j": Col('Int32', shape=1, itemsize=4, dflt=0),
696
"idnumber": Col('Int64', shape=1, itemsize=8, dflt=0),
697
"name": Col('CharType', shape=1, itemsize=16, dflt=None),
698
"pressure": Col('Float32', shape=1, itemsize=4, dflt=0.0) }
703
>More detailed information is displayed about each object
704
in the tree. Note how <SAMP
705
CLASS="computeroutput"
708
descriptor class, is printed as part of the
715
> table description information. In
716
general, you can obtain much more information about the
717
objects and their children by just printing them. That
718
introspection capability is very useful, and I recommend
719
that you use it extensively.
722
>The time has come to fill this table with some
723
values. First we will get a pointer to the
725
CLASS="computeroutput"
728
HREF="x2981.html#RowClassDescr"
731
instance of this <SAMP
732
CLASS="computeroutput"
738
> >>> particle = table.row
742
CLASS="computeroutput"
745
CLASS="computeroutput"
749
CLASS="computeroutput"
751
> instance that will be used
752
to write data rows into the table. We write data simply by
754
CLASS="computeroutput"
756
> instance the values for
757
each row as if it were a dictionary (although it is
768
>Below is an example of how to write rows:
772
> >>> for i in xrange(10):
773
... particle['name'] = 'Particle: %6d' % (i)
774
... particle['TDCcount'] = i % 256
775
... particle['ADCcount'] = (i * 256) % (1 << 16)
776
... particle['grid_i'] = i
777
... particle['grid_j'] = 10 - i
778
... particle['pressure'] = float(i*i)
779
... particle['energy'] = float(particle['pressure'] ** 4)
780
... particle['idnumber'] = i * (2 ** 34)
781
... particle.append()
786
>This code should be easy to understand. The lines inside
787
the loop just assign values to the different columns in
788
the Row instance <SAMP
789
CLASS="computeroutput"
792
HREF="x2981.html#RowClassDescr"
796
CLASS="computeroutput"
799
information to the <SAMP
800
CLASS="computeroutput"
805
>After we have processed all our data, we should flush the
806
table's I/O buffer if we want to write all
807
this data to disk. We achieve that by calling the
809
CLASS="computeroutput"
815
> >>> table.flush()
823
NAME="subsection3.1.6"
824
>3.1.6. Reading (and selecting) data in a table</A
827
NAME="readingAndSelectingUsage"
830
>Ok. We have our data on disk, and now we need to access
831
it and select from specific columns the values we are
832
interested in. See the example below:
836
> >>> table = h5file.root.detector.readout
837
>>> pressure = [ x['pressure'] for x in table.iterrows()
838
... if x['TDCcount']>3 and 20<=x['pressure']<50 ]
839
>>> pressure
843
>The first line creates a "shortcut"
850
> table deeper on the
851
object tree. As you can see, we use the <SPAN
858
it. We also could have used the
860
CLASS="computeroutput"
861
>h5file.getNode()</SAMP
862
> method, as we will do
866
>You will recognize the last two lines as a Python list
867
comprehension. It loops over the rows in <SPAN
874
they are provided by the <SAMP
875
CLASS="computeroutput"
876
>table.iterrows()</SAMP
879
HREF="x2981.html#Table.iterrows"
882
iterator returns values until all the data in table is
883
exhausted. These rows are filtered using the expression:
886
> x['TDCcount'] > 3 and x['pressure'] <50
889
We select the value of the <SAMP
890
CLASS="computeroutput"
893
filtered records to create the final list and assign it to
895
CLASS="computeroutput"
900
>We could have used a normal <SAMP
901
CLASS="computeroutput"
904
accomplish the same purpose, but I find comprehension
905
syntax to be more compact and elegant.
908
>Let's select the <SAMP
909
CLASS="computeroutput"
911
> column for the same
916
> >>> names=[ x['name'] for x in table if x['TDCcount']>3 and 20<=x['pressure']<50 ]
917
>>> names
918
['Particle: 5', 'Particle: 6', 'Particle: 7']
921
>Note how we have omitted the <SAMP
922
CLASS="computeroutput"
925
in the list comprehension. The <SAMP
926
CLASS="computeroutput"
929
has an implementation of the special method
931
CLASS="computeroutput"
933
> that iterates over all the rows in
934
the table. In fact, <SAMP
935
CLASS="computeroutput"
938
calls this special <SAMP
939
CLASS="computeroutput"
942
Accessing all the rows in a table using this method is
943
very convenient, especially when working with the data
947
>That's enough about selections. The next section will show
948
you how to save these selected results to a file.
956
NAME="subsection3.1.7"
957
>3.1.7. Creating new array objects</A
960
>In order to separate the selected data from the mass of
961
detector data, we will create a new group
963
CLASS="computeroutput"
965
> branching off the root
966
group. Afterwards, under this group, we will create two
967
arrays that will contain the selected data. First, we
972
> >>> gcolumns = h5file.createGroup(h5file.root, "columns", "Pressure and Name")
975
>Note that this time we have specified the first parameter
984
CLASS="computeroutput"
986
>) instead of with an absolute
990
>Now, create the first of the two <SAMP
991
CLASS="computeroutput"
994
objects we've just mentioned:
998
> >>> h5file.createArray(gcolumns, 'pressure', array(pressure),
999
... "Pressure column selection")
1000
/columns/pressure (Array(3,)) 'Pressure column selection'
1004
byteorder = 'little'
1007
>We already know the first two parameters of the
1009
CLASS="computeroutput"
1012
HREF="x1533.html#createArrayDescr"
1014
>) methods (these are the
1015
same as the first two in <SAMP
1016
CLASS="computeroutput"
1019
are the parent group <SPAN
1026
CLASS="computeroutput"
1029
will be created and the <SAMP
1030
CLASS="computeroutput"
1039
>. The third parameter is the <SPAN
1046
we want to save to disk. In this case, it is a
1048
CLASS="computeroutput"
1050
> array that is built from the
1051
selection list we created before. The fourth parameter is
1061
>Now, we will save the second array. It contains the list
1062
of strings we selected before: we save this object as-is,
1063
with no further conversion.
1067
> >>> h5file.createArray(gcolumns, 'name', names, "Name column selection")
1068
/columns/name Array(4,) 'Name column selection'
1072
byteorder = 'little'
1075
>As you can see, <SAMP
1076
CLASS="computeroutput"
1077
>createArray()</SAMP
1085
> (which is a regular Python list) as an
1092
> parameter. Actually, it accepts a variety
1093
of different regular objects (see <A
1094
HREF="x1533.html#createArrayDescr"
1096
>) as parameters. The
1098
CLASS="computeroutput"
1100
> attribute (see the output above) saves
1101
the original kind of object that was saved. Based on this
1109
CLASS="computeroutput"
1112
retrieve exactly the same object from disk later on.
1115
>Note that in these examples, the <SAMP
1116
CLASS="computeroutput"
1119
method returns an <SAMP
1120
CLASS="computeroutput"
1122
> instance that is not
1123
assigned to any variable. Don't worry, this is intentional
1124
to show the kind of object we have created by displaying
1125
its representation. The <SAMP
1126
CLASS="computeroutput"
1129
been attached to the object tree and saved to disk, as you
1130
can see if you print the complete object tree:
1134
> >>> print h5file
1135
Filename: 'tutorial1.h5' Title: 'Test file' Last modif.: 'Sun Jul 27 14:00:13 2003'
1136
/ (Group) 'Test file'
1137
/columns (Group) 'Pressure and Name'
1138
/columns/name (Array(3,)) 'Name column selection'
1139
/columns/pressure (Array(3,)) 'Pressure column selection'
1140
/detector (Group) 'Detector information'
1141
/detector/readout (Table(10,)) 'Readout example'
1150
NAME="subsection3.1.8"
1151
>3.1.8. Closing the file and looking at its content</A
1154
>To finish this first tutorial, we use the
1156
CLASS="computeroutput"
1158
> method of the h5file <SAMP
1159
CLASS="computeroutput"
1162
object to close the file before exiting Python:
1166
> >>> h5file.close()
1170
>You have now created your first <SAMP
1171
CLASS="computeroutput"
1174
file with a table and two arrays. You can examine it with
1175
any generic HDF5 tool, such as <SAMP
1176
CLASS="computeroutput"
1180
CLASS="computeroutput"
1184
CLASS="computeroutput"
1186
> looks like when read with the
1188
CLASS="computeroutput"
1194
> $ h5ls -rd tutorial1.h5
1196
/columns/name Dataset {3}
1198
(0) "Particle: 5", "Particle: 6", "Particle: 7"
1199
/columns/pressure Dataset {3}
1203
/detector/readout Dataset {10/Inf}
1205
(0) {0, 0, 0, 0, 10, 0, "Particle: 0", 0},
1206
(1) {256, 1, 1, 1, 9, 17179869184, "Particle: 1", 1},
1207
(2) {512, 2, 256, 2, 8, 34359738368, "Particle: 2", 4},
1208
(3) {768, 3, 6561, 3, 7, 51539607552, "Particle: 3", 9},
1209
(4) {1024, 4, 65536, 4, 6, 68719476736, "Particle: 4", 16},
1210
(5) {1280, 5, 390625, 5, 5, 85899345920, "Particle: 5", 25},
1211
(6) {1536, 6, 1679616, 6, 4, 103079215104, "Particle: 6", 36},
1212
(7) {1792, 7, 5764801, 7, 3, 120259084288, "Particle: 7", 49},
1213
(8) {2048, 8, 16777216, 8, 2, 137438953472, "Particle: 8", 64},
1214
(9) {2304, 9, 43046721, 9, 1, 154618822656, "Particle: 9", 81}
1217
>Here's the outputs as displayed by the "ptdump"
1219
CLASS="computeroutput"
1221
> utility (located in
1223
CLASS="computeroutput"
1229
> $ ptdump tutorial1.h5
1230
Filename: 'tutorial1.h5' Title: 'Test file' Last modif.: 'Sun Jul 27 14:40:51 2003'
1231
/ (Group) 'Test file'
1232
/columns (Group) 'Pressure and Name'
1233
/columns/name (Array(3,)) 'Name column selection'
1234
/columns/pressure (Array(3,)) 'Pressure column selection'
1235
/detector (Group) 'Detector information'
1236
/detector/readout (Table(10,)) 'Readout example'
1240
>You can pass the <SAMP
1241
CLASS="computeroutput"
1244
CLASS="computeroutput"
1248
CLASS="computeroutput"
1251
verbosity. Try them out!
1255
HREF="c514.html#tutorial1-1-tableview"
1258
you can admire how the <SAMP
1259
CLASS="computeroutput"
1263
HREF="http://www.carabos.com/products/vitables.html"
1267
> graphical interface .
1272
NAME="tutorial1-1-tableview"
1276
>Figure 3.1. The initial version of the data file for tutorial 1,
1277
with a view of the data objects.
1284
SRC="tutorial1-1-tableview.png"></P
1295
SUMMARY="Footer navigation table"
1334
>Binary installation (Windows)</TD
b'\\ No newline at end of file'