1
________________________________________________________________________
3
PYBENCH - A Python Benchmark Suite
4
________________________________________________________________________
6
Extendable suite of low-level benchmarks for measuring
7
the performance of the Python implementation
8
(interpreter, compiler or VM).
10
pybench is a collection of tests that provides a standardized way to
11
measure the performance of Python implementations. It takes a very
12
close look at different aspects of Python programs and let's you
13
decide which factors are more important to you than others, rather
14
than wrapping everything up in one number, like the other performance
15
tests do (e.g. pystone which is included in the Python Standard
18
pybench has been used in the past by several Python developers to
19
track down performance bottlenecks or to demonstrate the impact of
20
optimizations and new features in Python.
22
The command line interface for pybench is the file pybench.py. Run
23
this script with option '--help' to get a listing of the possible
24
options. Without options, pybench will simply execute the benchmark
25
and then print out a report to stdout.
31
Run 'pybench.py -h' to see the help screen. Run 'pybench.py' to run
32
the benchmark suite using default settings and 'pybench.py -f <file>'
33
to have it store the results in a file too.
35
It is usually a good idea to run pybench.py multiple times to see
36
whether the environment, timers and benchmark run-times are suitable
37
for doing benchmark tests.
39
You can use the comparison feature of pybench.py ('pybench.py -c
40
<file>') to check how well the system behaves in comparison to a
43
If the differences are well below 10% for each test, then you have a
44
system that is good for doing benchmark testings. Of you get random
45
differences of more than 10% or significant differences between the
46
values for minimum and average time, then you likely have some
47
background processes running which cause the readings to become
48
inconsistent. Examples include: web-browsers, email clients, RSS
49
readers, music players, backup programs, etc.
51
If you are only interested in a few tests of the whole suite, you can
52
use the filtering option, e.g. 'pybench.py -t string' will only
53
run/show the tests that have 'string' in their name.
55
This is the current output of pybench.py --help:
58
------------------------------------------------------------------------
59
PYBENCH - a benchmark test suite for Python interpreters/compilers.
60
------------------------------------------------------------------------
63
pybench.py [option] files...
65
Options and default settings:
66
-n arg number of rounds (10)
67
-f arg save benchmark to file arg ()
68
-c arg compare benchmark with the one in file arg ()
69
-s arg show benchmark in file arg, then exit ()
70
-w arg set warp factor to arg (10)
71
-t arg run only tests with names matching arg ()
72
-C arg set the number of calibration runs to arg (20)
73
-d hide noise in comparisons (0)
74
-v verbose output (not recommended) (0)
75
--with-gc enable garbage collection (0)
76
--with-syscheck use default sys check interval (0)
77
--timer arg use given timer (time.time)
78
-h show this help text
79
--help show this help text
80
--debug enable debugging
81
--copyright show copyright
82
--examples show examples of usage
87
The normal operation is to run the suite and display the
88
results. Use -f to save them for later reuse or comparisons.
98
python3.0 pybench.py -f p30.pybench
99
python3.1 pybench.py -f p31.pybench
100
python pybench.py -s p31.pybench -c p30.pybench
113
-------------------------------------------------------------------------------
115
-------------------------------------------------------------------------------
117
* disabled garbage collection
118
* system check interval set to maximum: 2147483647
119
* using timer: time.time
121
Calibrating tests. Please wait...
123
Running 10 round(s) of the suite at warp factor 10:
125
* Round 1 done in 6.388 seconds.
126
* Round 2 done in 6.485 seconds.
127
* Round 3 done in 6.786 seconds.
129
* Round 10 done in 6.546 seconds.
131
-------------------------------------------------------------------------------
132
Benchmark: 2006-06-12 12:09:25
133
-------------------------------------------------------------------------------
140
Platform ID: Linux-2.6.8-24.19-default-x86_64-with-SuSE-9.2-x86-64
144
Implementation: CPython
145
Executable: /usr/local/bin/python
147
Compiler: GCC 3.3.4 (pre 3.3.5 20040809)
149
Build: Oct 1 2005 15:24:35 (#1)
153
Test minimum average operation overhead
154
-------------------------------------------------------------------------------
155
BuiltinFunctionCalls: 126ms 145ms 0.28us 0.274ms
156
BuiltinMethodLookup: 124ms 130ms 0.12us 0.316ms
157
CompareFloats: 109ms 110ms 0.09us 0.361ms
158
CompareFloatsIntegers: 100ms 104ms 0.12us 0.271ms
159
CompareIntegers: 137ms 138ms 0.08us 0.542ms
160
CompareInternedStrings: 124ms 127ms 0.08us 1.367ms
161
CompareLongs: 100ms 104ms 0.10us 0.316ms
162
CompareStrings: 111ms 115ms 0.12us 0.929ms
163
CompareUnicode: 108ms 128ms 0.17us 0.693ms
164
ConcatStrings: 142ms 155ms 0.31us 0.562ms
165
ConcatUnicode: 119ms 127ms 0.42us 0.384ms
166
CreateInstances: 123ms 128ms 1.14us 0.367ms
167
CreateNewInstances: 121ms 126ms 1.49us 0.335ms
168
CreateStringsWithConcat: 130ms 135ms 0.14us 0.916ms
169
CreateUnicodeWithConcat: 130ms 135ms 0.34us 0.361ms
170
DictCreation: 108ms 109ms 0.27us 0.361ms
171
DictWithFloatKeys: 149ms 153ms 0.17us 0.678ms
172
DictWithIntegerKeys: 124ms 126ms 0.11us 0.915ms
173
DictWithStringKeys: 114ms 117ms 0.10us 0.905ms
174
ForLoops: 110ms 111ms 4.46us 0.063ms
175
IfThenElse: 118ms 119ms 0.09us 0.685ms
176
ListSlicing: 116ms 120ms 8.59us 0.103ms
177
NestedForLoops: 125ms 137ms 0.09us 0.019ms
178
NormalClassAttribute: 124ms 136ms 0.11us 0.457ms
179
NormalInstanceAttribute: 110ms 117ms 0.10us 0.454ms
180
PythonFunctionCalls: 107ms 113ms 0.34us 0.271ms
181
PythonMethodCalls: 140ms 149ms 0.66us 0.141ms
182
Recursion: 156ms 166ms 3.32us 0.452ms
183
SecondImport: 112ms 118ms 1.18us 0.180ms
184
SecondPackageImport: 118ms 127ms 1.27us 0.180ms
185
SecondSubmoduleImport: 140ms 151ms 1.51us 0.180ms
186
SimpleComplexArithmetic: 128ms 139ms 0.16us 0.361ms
187
SimpleDictManipulation: 134ms 136ms 0.11us 0.452ms
188
SimpleFloatArithmetic: 110ms 113ms 0.09us 0.571ms
189
SimpleIntFloatArithmetic: 106ms 111ms 0.08us 0.548ms
190
SimpleIntegerArithmetic: 106ms 109ms 0.08us 0.544ms
191
SimpleListManipulation: 103ms 113ms 0.10us 0.587ms
192
SimpleLongArithmetic: 112ms 118ms 0.18us 0.271ms
193
SmallLists: 105ms 116ms 0.17us 0.366ms
194
SmallTuples: 108ms 128ms 0.24us 0.406ms
195
SpecialClassAttribute: 119ms 136ms 0.11us 0.453ms
196
SpecialInstanceAttribute: 143ms 155ms 0.13us 0.454ms
197
StringMappings: 115ms 121ms 0.48us 0.405ms
198
StringPredicates: 120ms 129ms 0.18us 2.064ms
199
StringSlicing: 111ms 127ms 0.23us 0.781ms
200
TryExcept: 125ms 126ms 0.06us 0.681ms
201
TryRaiseExcept: 133ms 137ms 2.14us 0.361ms
202
TupleSlicing: 117ms 120ms 0.46us 0.066ms
203
UnicodeMappings: 156ms 160ms 4.44us 0.429ms
204
UnicodePredicates: 117ms 121ms 0.22us 2.487ms
205
UnicodeProperties: 115ms 153ms 0.38us 2.070ms
206
UnicodeSlicing: 126ms 129ms 0.26us 0.689ms
207
-------------------------------------------------------------------------------
208
Totals: 6283ms 6673ms
210
________________________________________________________________________
213
________________________________________________________________________
215
pybench tests are simple modules defining one or more pybench.Test
218
Writing a test essentially boils down to providing two methods:
219
.test() which runs .rounds number of .operations test operations each
220
and .calibrate() which does the same except that it doesn't actually
221
execute the operations.
227
from pybench import Test
229
class IntegerCounting(Test):
231
# Version number of the test as float (x.yy); this is important
232
# for comparisons of benchmark runs - tests with unequal version
233
# number will not get compared.
236
# The number of abstract operations done in each round of the
237
# test. An operation is the basic unit of what you want to
238
# measure. The benchmark will output the amount of run-time per
239
# operation. Note that in order to raise the measured timings
240
# significantly above noise level, it is often required to repeat
241
# sets of operations more than once per test round. The measured
242
# overhead per test round should be less than 1 second.
245
# Number of rounds to execute per test run. This should be
246
# adjusted to a figure that results in a test run-time of between
247
# 1-2 seconds (at warp 1).
254
The test needs to run self.rounds executing
255
self.operations number of operations each.
263
for i in range(self.rounds):
265
# Repeat the operations per round to raise the run-time
266
# per operation significantly above the noise level of the
269
# Execute 20 operations (a += 1):
293
""" Calibrate the test.
295
This method should execute everything that is needed to
296
setup and run the test - except for the actual operations
297
that you intend to measure. pybench uses this method to
298
measure the test implementation overhead.
304
# Run test rounds (without actually doing any operation)
305
for i in range(self.rounds):
307
# Skip the actual execution of the operations, since we
308
# only want to measure the test's administration overhead.
311
Registering a new test module
312
-----------------------------
314
To register a test module with pybench, the classes need to be
315
imported into the pybench.Setup module. pybench will then scan all the
316
symbols defined in that module for subclasses of pybench.Test and
317
automatically add them to the benchmark suite.
320
Breaking Comparability
321
----------------------
323
If a change is made to any individual test that means it is no
324
longer strictly comparable with previous runs, the '.version' class
325
variable should be updated. Therefafter, comparisons with previous
326
versions of the test will list as "n/a" to reflect the change.
332
2.1: made some minor changes for compatibility with Python 3.0:
333
- replaced cmp with divmod and range with max in Calls.py
334
(cmp no longer exists in 3.0, and range is a list in
335
Python 2.x and an iterator in Python 3.x)
337
2.0: rewrote parts of pybench which resulted in more repeatable
339
- made timer a parameter
340
- changed the platform default timer to use high-resolution
341
timers rather than process timers (which have a much lower
343
- added option to select timer
344
- added process time timer (using systimes.py)
345
- changed to use min() as timing estimator (average
346
is still taken as well to provide an idea of the difference)
347
- garbage collection is turned off per default
348
- sys check interval is set to the highest possible value
349
- calibration is now a separate step and done using
350
a different strategy that allows measuring the test
351
overhead more accurately
352
- modified the tests to each give a run-time of between
353
100-200ms using warp 10
354
- changed default warp factor to 10 (from 20)
355
- compared results with timeit.py and confirmed measurements
356
- bumped all test versions to 2.0
357
- updated platform.py to the latest version
358
- changed the output format a bit to make it look
360
- refactored the APIs somewhat
361
1.3+: Steve Holden added the NewInstances test and the filtering
362
option during the NeedForSpeed sprint; this also triggered a long
363
discussion on how to improve benchmark timing and finally
364
resulted in the release of 2.0
365
1.3: initial checkin into the Python SVN repository