1
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN""http://www.w3.org/TR/html4/loose.dtd">
5
>Getting the most from the node LRU cache</TITLE
8
CONTENT="Modular DocBook HTML Stylesheet Version 1.79"><LINK
10
TITLE="PyTables User's Guide"
11
HREF="index.html"><LINK
13
TITLE="Optimization tips"
14
HREF="c5270.html"><LINK
17
HREF="x5552.html"><LINK
19
TITLE="Selecting an User Entry Point (UEP) in your
21
HREF="x5654.html"></HEAD
32
SUMMARY="Header navigation table"
44
> User's Guide: Hierarchical datasets in Python - Release 1.3.2</TH
60
>Chapter 5. Optimization tips</TD
81
>5.6. Getting the most from the node LRU cache</A
84
>Starting from PyTables 1.2 on, it has been introduced a new
85
LRU cache that prevents from loading all the nodes of the
92
> in memory. This cache is responsible of
93
loading just up to a certain amount of nodes and discard the
94
least recent used ones when there is a need to load new
95
ones. This represents a big advantage over the old schema,
96
specially in terms of memory usage (as there is no need to
103
> node in memory), but it also adds very
104
convenient optimizations for working interactively like, for
105
example, speeding-up the opening times of files with lots of
106
nodes, allowing to open almost any kind of file in typically
107
less than one tenth of second (compare this with the more
108
than 10 seconds for files with more than 10000 nodes in
109
PyTables pre-1.2 era). See [<SPAN
112
>] for more info on the
113
advantages (and also drawbacks) of this approach.
116
>One thing that deserves some discussion is the election of
117
the parameter that sets the maximum amount of nodes to be
118
held in memory at any time. As PyTables is meant to be
119
deployed in machines that have potentially low memory, the
120
default for it is quite conservative (you can look at its
121
actual value in the <SAMP
122
CLASS="computeroutput"
123
>NODE_CACHE_SIZE</SAMP
126
CLASS="computeroutput"
127
>tables/constants.py</SAMP
129
usually have to deal with files that have much more nodes
130
than the maximum default, and you have a lot of free memory
131
in your system, then you may want to experiment which is the
132
appropriate value of <SAMP
133
CLASS="computeroutput"
134
>NODE_CACHE_SIZE</SAMP
139
>As an example, look at the next code:
143
> def browse_tables(filename):
144
fileh = openFile(filename,'a')
145
group = fileh.root.newgroup
147
for tt in fileh.walkNodes(group, "Table"):
148
title = tt.attrs.TITLE
154
>We will be running the code above against a couple of files
156
CLASS="computeroutput"
158
> containing 100 tables and
159
1000 tables respectively. We will run this small benchmark
160
for different values of the LRU cache size, namely 256 and
161
1024. You can see the results in <A
162
HREF="x5576.html#LRUTblComparison"
169
NAME="LRUTblComparison"
173
>Table 5.1. Retrieving speed and memory consumption
174
dependency of the number of nodes in LRU cache.
250
>Node is coming from...</TH
350
HREF="x5576.html#LRUTblComparison"
353
one can see that, when the number of objects that you are
354
dealing with does fit in cache, you will get better access
355
times to them. Also, incrementing the node cache size does
356
effectively consumes more memory <SPAN
363
nodes exceeds the slots in cache; otherwise the memory
364
consumption remains the same. It is also worth noting that
365
incrementing the node cache size in the case you want to fit
366
all your nodes in cache, it does not take much more memory
367
than keeping too conservative. On another hand, it might
368
happen that the speed-up that you can achieve by allocating
369
more slots in your cache maybe is not worth the amount of
373
>Anyway, if you feel that this issue is important for you,
374
setup your own experiments and proceed fine-tuning the
376
CLASS="computeroutput"
377
>NODE_CACHE_SIZE</SAMP
386
SUMMARY="Footer navigation table"
439
>Selecting an User Entry Point (UEP) in your
b'\\ No newline at end of file'