2
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
3
<!-- Created on January, 5 2002 by texi2html 1.64 -->
5
Written by: Lionel Cons <Lionel.Cons@cern.ch> (original author)
6
Karl Berry <karl@freefriends.org>
7
Olaf Bachmann <obachman@mathematik.uni-kl.de>
9
Maintained by: Olaf Bachmann <obachman@mathematik.uni-kl.de>
10
Send bugs and suggestions to <texi2html@mathematik.uni-kl.de>
14
<TITLE>Untitled Document: 4. Miscellanea</TITLE>
16
<META NAME="description" CONTENT="Untitled Document: 4. Miscellanea">
17
<META NAME="keywords" CONTENT="Untitled Document: 4. Miscellanea">
18
<META NAME="resource-type" CONTENT="document">
19
<META NAME="distribution" CONTENT="global">
20
<META NAME="Generator" CONTENT="texi2html 1.64">
24
<BODY LANG="" BGCOLOR="#FFFFFF" TEXT="#000000" LINK="#0000FF" VLINK="#800080" ALINK="#FF0000">
27
<TABLE CELLPADDING=1 CELLSPACING=1 BORDER=0>
28
<TR><TD VALIGN="MIDDLE" ALIGN="LEFT">[<A HREF="manual_3.html#SEC42"> < </A>]</TD>
29
<TD VALIGN="MIDDLE" ALIGN="LEFT">[<A HREF="manual_4.html#SEC44"> > </A>]</TD>
30
<TD VALIGN="MIDDLE" ALIGN="LEFT"> <TD VALIGN="MIDDLE" ALIGN="LEFT">[ << ]</TD>
31
<TD VALIGN="MIDDLE" ALIGN="LEFT">[<A HREF="manual.html#SEC_Top"> Up </A>]</TD>
32
<TD VALIGN="MIDDLE" ALIGN="LEFT">[ >> ]</TD>
33
<TD VALIGN="MIDDLE" ALIGN="LEFT"> <TD VALIGN="MIDDLE" ALIGN="LEFT"> <TD VALIGN="MIDDLE" ALIGN="LEFT"> <TD VALIGN="MIDDLE" ALIGN="LEFT"> <TD VALIGN="MIDDLE" ALIGN="LEFT">[<A HREF="manual.html#SEC_Top">Top</A>]</TD>
34
<TD VALIGN="MIDDLE" ALIGN="LEFT">[<A HREF="manual_toc.html#SEC_Contents">Contents</A>]</TD>
35
<TD VALIGN="MIDDLE" ALIGN="LEFT">[Index]</TD>
36
<TD VALIGN="MIDDLE" ALIGN="LEFT">[<A HREF="manual_abt.html#SEC_About"> ? </A>]</TD>
38
<H1> 4. Miscellanea </H1>
42
These are just some random thoughts of mine. Your mileage may
48
<TABLE CELLPADDING=1 CELLSPACING=1 BORDER=0>
49
<TR><TD VALIGN="MIDDLE" ALIGN="LEFT">[<A HREF="manual_4.html#SEC43"> < </A>]</TD>
50
<TD VALIGN="MIDDLE" ALIGN="LEFT">[<A HREF="manual_4.html#SEC45"> > </A>]</TD>
51
<TD VALIGN="MIDDLE" ALIGN="LEFT"> <TD VALIGN="MIDDLE" ALIGN="LEFT">[ << ]</TD>
52
<TD VALIGN="MIDDLE" ALIGN="LEFT">[<A HREF="manual.html#SEC_Top"> Up </A>]</TD>
53
<TD VALIGN="MIDDLE" ALIGN="LEFT">[ >> ]</TD>
54
<TD VALIGN="MIDDLE" ALIGN="LEFT"> <TD VALIGN="MIDDLE" ALIGN="LEFT"> <TD VALIGN="MIDDLE" ALIGN="LEFT"> <TD VALIGN="MIDDLE" ALIGN="LEFT"> <TD VALIGN="MIDDLE" ALIGN="LEFT">[<A HREF="manual.html#SEC_Top">Top</A>]</TD>
55
<TD VALIGN="MIDDLE" ALIGN="LEFT">[<A HREF="manual_toc.html#SEC_Contents">Contents</A>]</TD>
56
<TD VALIGN="MIDDLE" ALIGN="LEFT">[Index]</TD>
57
<TD VALIGN="MIDDLE" ALIGN="LEFT">[<A HREF="manual_abt.html#SEC_About"> ? </A>]</TD>
59
<H2> 4.1 Limitations of the compressed file format </H2>
61
<CODE>bzip2-1.0</CODE>, <CODE>0.9.5</CODE> and <CODE>0.9.0</CODE>
62
use exactly the same file format as the previous
63
version, <CODE>bzip2-0.1</CODE>. This decision was made in the interests of
64
stability. Creating yet another incompatible compressed file format
65
would create further confusion and disruption for users.
68
Nevertheless, this is not a painless decision. Development
69
work since the release of <CODE>bzip2-0.1</CODE> in August 1997
70
has shown complexities in the file format which slow down
71
decompression and, in retrospect, are unnecessary. These are:
73
<LI>The run-length encoder, which is the first of the
74
compression transformations, is entirely irrelevant.
75
The original purpose was to protect the sorting algorithm
76
from the very worst case input: a string of repeated
77
symbols. But algorithm steps Q6a and Q6b in the original
78
Burrows-Wheeler technical report (SRC-124) show how
79
repeats can be handled without difficulty in block
81
<LI>The randomisation mechanism doesn't really need to be
82
there. Udi Manber and Gene Myers published a suffix
83
array construction algorithm a few years back, which
84
can be employed to sort any block, no matter how
85
repetitive, in O(N log N) time. Subsequent work by
86
Kunihiko Sadakane has produced a derivative O(N (log N)^2)
87
algorithm which usually outperforms the Manber-Myers
91
I could have changed to Sadakane's algorithm, but I find
92
it to be slower than <CODE>bzip2</CODE>'s existing algorithm for
93
most inputs, and the randomisation mechanism protects
94
adequately against bad cases. I didn't think it was
95
a good tradeoff to make. Partly this is due to the fact
96
that I was not flooded with email complaints about
97
<CODE>bzip2-0.1</CODE>'s performance on repetitive data, so
98
perhaps it isn't a problem for real inputs.
101
Probably the best long-term solution,
102
and the one I have incorporated into 0.9.5 and above,
103
is to use the existing sorting
104
algorithm initially, and fall back to a O(N (log N)^2)
105
algorithm if the standard algorithm gets into difficulties.
106
<LI>The compressed file format was never designed to be
107
handled by a library, and I have had to jump though
108
some hoops to produce an efficient implementation of
109
decompression. It's a bit hairy. Try passing
110
<CODE>decompress.c</CODE> through the C preprocessor
111
and you'll see what I mean. Much of this complexity
112
could have been avoided if the compressed size of
113
each block of data was recorded in the data stream.
114
<LI>An Adler-32 checksum, rather than a CRC32 checksum,
115
would be faster to compute.
117
It would be fair to say that the <CODE>bzip2</CODE> format was frozen
118
before I properly and fully understood the performance
119
consequences of doing so.
122
Improvements which I was able to incorporate into
123
0.9.0, despite using the same file format, are:
125
<LI>Single array implementation of the inverse BWT. This
126
significantly speeds up decompression, presumably
127
because it reduces the number of cache misses.
128
<LI>Faster inverse MTF transform for large MTF values. The
129
new implementation is based on the notion of sliding blocks
131
<LI><CODE>bzip2-0.9.0</CODE> now reads and writes files with <CODE>fread</CODE>
132
and <CODE>fwrite</CODE>; version 0.1 used <CODE>putc</CODE> and <CODE>getc</CODE>.
133
Duh! Well, you live and learn.
137
Further ahead, it would be nice
138
to be able to do random access into files. This will
139
require some careful design of compressed file formats.
144
<TABLE CELLPADDING=1 CELLSPACING=1 BORDER=0>
145
<TR><TD VALIGN="MIDDLE" ALIGN="LEFT">[<A HREF="manual_4.html#SEC44"> < </A>]</TD>
146
<TD VALIGN="MIDDLE" ALIGN="LEFT">[<A HREF="manual_4.html#SEC46"> > </A>]</TD>
147
<TD VALIGN="MIDDLE" ALIGN="LEFT"> <TD VALIGN="MIDDLE" ALIGN="LEFT">[ << ]</TD>
148
<TD VALIGN="MIDDLE" ALIGN="LEFT">[<A HREF="manual.html#SEC_Top"> Up </A>]</TD>
149
<TD VALIGN="MIDDLE" ALIGN="LEFT">[ >> ]</TD>
150
<TD VALIGN="MIDDLE" ALIGN="LEFT"> <TD VALIGN="MIDDLE" ALIGN="LEFT"> <TD VALIGN="MIDDLE" ALIGN="LEFT"> <TD VALIGN="MIDDLE" ALIGN="LEFT"> <TD VALIGN="MIDDLE" ALIGN="LEFT">[<A HREF="manual.html#SEC_Top">Top</A>]</TD>
151
<TD VALIGN="MIDDLE" ALIGN="LEFT">[<A HREF="manual_toc.html#SEC_Contents">Contents</A>]</TD>
152
<TD VALIGN="MIDDLE" ALIGN="LEFT">[Index]</TD>
153
<TD VALIGN="MIDDLE" ALIGN="LEFT">[<A HREF="manual_abt.html#SEC_About"> ? </A>]</TD>
155
<H2> 4.2 Portability issues </H2>
156
<!--docid::SEC45::-->
157
After some consideration, I have decided not to use
158
GNU <CODE>autoconf</CODE> to configure 0.9.5 or 1.0.
161
<CODE>autoconf</CODE>, admirable and wonderful though it is,
162
mainly assists with portability problems between Unix-like
163
platforms. But <CODE>bzip2</CODE> doesn't have much in the way
164
of portability problems on Unix; most of the difficulties appear
165
when porting to the Mac, or to Microsoft's operating systems.
166
<CODE>autoconf</CODE> doesn't help in those cases, and brings in a
167
whole load of new complexity.
170
Most people should be able to compile the library and program
171
under Unix straight out-of-the-box, so to speak, especially
172
if you have a version of GNU C available.
175
There are a couple of <CODE>__inline__</CODE> directives in the code. GNU C
176
(<CODE>gcc</CODE>) should be able to handle them. If you're not using
177
GNU C, your C compiler shouldn't see them at all.
178
If your compiler does, for some reason, see them and doesn't
179
like them, just <CODE>#define</CODE> <CODE>__inline__</CODE> to be <CODE>/* */</CODE>. One
180
easy way to do this is to compile with the flag <CODE>-D__inline__=</CODE>,
181
which should be understood by most Unix compilers.
184
If you still have difficulties, try compiling with the macro
185
<CODE>BZ_STRICT_ANSI</CODE> defined. This should enable you to build the
186
library in a strictly ANSI compliant environment. Building the program
187
itself like this is dangerous and not supported, since you remove
188
<CODE>bzip2</CODE>'s checks against compressing directories, symbolic links,
189
devices, and other not-really-a-file entities. This could cause
190
filesystem corruption!
193
One other thing: if you create a <CODE>bzip2</CODE> binary for public
194
distribution, please try and link it statically (<CODE>gcc -s</CODE>). This
195
avoids all sorts of library-version issues that others may encounter
199
If you build <CODE>bzip2</CODE> on Win32, you must set <CODE>BZ_UNIX</CODE> to 0 and
200
<CODE>BZ_LCCWIN32</CODE> to 1, in the file <CODE>bzip2.c</CODE>, before compiling.
201
Otherwise the resulting binary won't work correctly.
206
<TABLE CELLPADDING=1 CELLSPACING=1 BORDER=0>
207
<TR><TD VALIGN="MIDDLE" ALIGN="LEFT">[<A HREF="manual_4.html#SEC45"> < </A>]</TD>
208
<TD VALIGN="MIDDLE" ALIGN="LEFT">[<A HREF="manual_4.html#SEC47"> > </A>]</TD>
209
<TD VALIGN="MIDDLE" ALIGN="LEFT"> <TD VALIGN="MIDDLE" ALIGN="LEFT">[ << ]</TD>
210
<TD VALIGN="MIDDLE" ALIGN="LEFT">[<A HREF="manual.html#SEC_Top"> Up </A>]</TD>
211
<TD VALIGN="MIDDLE" ALIGN="LEFT">[ >> ]</TD>
212
<TD VALIGN="MIDDLE" ALIGN="LEFT"> <TD VALIGN="MIDDLE" ALIGN="LEFT"> <TD VALIGN="MIDDLE" ALIGN="LEFT"> <TD VALIGN="MIDDLE" ALIGN="LEFT"> <TD VALIGN="MIDDLE" ALIGN="LEFT">[<A HREF="manual.html#SEC_Top">Top</A>]</TD>
213
<TD VALIGN="MIDDLE" ALIGN="LEFT">[<A HREF="manual_toc.html#SEC_Contents">Contents</A>]</TD>
214
<TD VALIGN="MIDDLE" ALIGN="LEFT">[Index]</TD>
215
<TD VALIGN="MIDDLE" ALIGN="LEFT">[<A HREF="manual_abt.html#SEC_About"> ? </A>]</TD>
217
<H2> 4.3 Reporting bugs </H2>
218
<!--docid::SEC46::-->
219
I tried pretty hard to make sure <CODE>bzip2</CODE> is
220
bug free, both by design and by testing. Hopefully
221
you'll never need to read this section for real.
224
Nevertheless, if <CODE>bzip2</CODE> dies with a segmentation
225
fault, a bus error or an internal assertion failure, it
226
will ask you to email me a bug report. Experience with
227
version 0.1 shows that almost all these problems can
228
be traced to either compiler bugs or hardware problems.
231
Recompile the program with no optimisation, and see if it
232
works. And/or try a different compiler.
233
I heard all sorts of stories about various flavours
234
of GNU C (and other compilers) generating bad code for
235
<CODE>bzip2</CODE>, and I've run across two such examples myself.
238
2.7.X versions of GNU C are known to generate bad code from
239
time to time, at high optimisation levels.
240
If you get problems, try using the flags
241
<CODE>-O2</CODE> <CODE>-fomit-frame-pointer</CODE> <CODE>-fno-strength-reduce</CODE>.
242
You should specifically <EM>not</EM> use <CODE>-funroll-loops</CODE>.
245
You may notice that the Makefile runs six tests as part of
246
the build process. If the program passes all of these, it's
247
a pretty good (but not 100%) indication that the compiler has
248
done its job correctly.
250
If <CODE>bzip2</CODE> crashes randomly, and the crashes are not
251
repeatable, you may have a flaky memory subsystem. <CODE>bzip2</CODE>
252
really hammers your memory hierarchy, and if it's a bit marginal,
253
you may get these problems. Ditto if your disk or I/O subsystem
254
is slowly failing. Yup, this really does happen.
257
Try using a different machine of the same type, and see if
258
you can repeat the problem.
259
<LI>This isn't really a bug, but ... If <CODE>bzip2</CODE> tells
260
you your file is corrupted on decompression, and you
261
obtained the file via FTP, there is a possibility that you
262
forgot to tell FTP to do a binary mode transfer. That absolutely
263
will cause the file to be non-decompressible. You'll have to transfer
268
If you've incorporated <CODE>libbzip2</CODE> into your own program
269
and are getting problems, please, please, please, check that the
270
parameters you are passing in calls to the library, are
271
correct, and in accordance with what the documentation says
272
is allowable. I have tried to make the library robust against
273
such problems, but I'm sure I haven't succeeded.
276
Finally, if the above comments don't help, you'll have to send
277
me a bug report. Now, it's just amazing how many people will
278
send me a bug report saying something like
279
<TABLE><tr><td> </td><td class=display><pre style="font-family: serif"> bzip2 crashed with segmentation fault on my machine
280
</pre></td></tr></table>and absolutely nothing else. Needless to say, a such a report
281
is <EM>totally, utterly, completely and comprehensively 100% useless;
282
a waste of your time, my time, and net bandwidth</EM>.
283
With no details at all, there's no way I can possibly begin
284
to figure out what the problem is.
287
The rules of the game are: facts, facts, facts. Don't omit
288
them because "oh, they won't be relevant". At the bare
290
<TABLE><tr><td> </td><td class=display><pre style="font-family: serif"> Machine type. Operating system version.
291
Exact version of <CODE>bzip2</CODE> (do <CODE>bzip2 -V</CODE>).
292
Exact version of the compiler used.
293
Flags passed to the compiler.
294
</pre></td></tr></table>However, the most important single thing that will help me is
295
the file that you were trying to compress or decompress at the
296
time the problem happened. Without that, my ability to do anything
297
more than speculate about the cause, is limited.
300
Please remember that I connect to the Internet with a modem, so
301
you should contact me before mailing me huge files.
306
<TABLE CELLPADDING=1 CELLSPACING=1 BORDER=0>
307
<TR><TD VALIGN="MIDDLE" ALIGN="LEFT">[<A HREF="manual_4.html#SEC46"> < </A>]</TD>
308
<TD VALIGN="MIDDLE" ALIGN="LEFT">[<A HREF="manual_4.html#SEC48"> > </A>]</TD>
309
<TD VALIGN="MIDDLE" ALIGN="LEFT"> <TD VALIGN="MIDDLE" ALIGN="LEFT">[ << ]</TD>
310
<TD VALIGN="MIDDLE" ALIGN="LEFT">[<A HREF="manual.html#SEC_Top"> Up </A>]</TD>
311
<TD VALIGN="MIDDLE" ALIGN="LEFT">[ >> ]</TD>
312
<TD VALIGN="MIDDLE" ALIGN="LEFT"> <TD VALIGN="MIDDLE" ALIGN="LEFT"> <TD VALIGN="MIDDLE" ALIGN="LEFT"> <TD VALIGN="MIDDLE" ALIGN="LEFT"> <TD VALIGN="MIDDLE" ALIGN="LEFT">[<A HREF="manual.html#SEC_Top">Top</A>]</TD>
313
<TD VALIGN="MIDDLE" ALIGN="LEFT">[<A HREF="manual_toc.html#SEC_Contents">Contents</A>]</TD>
314
<TD VALIGN="MIDDLE" ALIGN="LEFT">[Index]</TD>
315
<TD VALIGN="MIDDLE" ALIGN="LEFT">[<A HREF="manual_abt.html#SEC_About"> ? </A>]</TD>
317
<H2> 4.4 Did you get the right package? </H2>
318
<!--docid::SEC47::-->
321
<CODE>bzip2</CODE> is a resource hog. It soaks up large amounts of CPU cycles
322
and memory. Also, it gives very large latencies. In the worst case, you
323
can feed many megabytes of uncompressed data into the library before
324
getting any compressed output, so this probably rules out applications
325
requiring interactive behaviour.
328
These aren't faults of my implementation, I hope, but more
329
an intrinsic property of the Burrows-Wheeler transform (unfortunately).
330
Maybe this isn't what you want.
333
If you want a compressor and/or library which is faster, uses less
334
memory but gets pretty good compression, and has minimal latency,
336
Gailly's and Mark Adler's work, <CODE>zlib-1.1.3</CODE> and
337
<CODE>gzip-1.2.4</CODE>. Look for them at
340
<CODE>http://www.zlib.org</CODE> and
341
<CODE>http://www.gzip.org</CODE> respectively.
344
For something faster and lighter still, you might try Markus F X J
345
Oberhumer's <CODE>LZO</CODE> real-time compression/decompression library, at
346
<BR> <CODE>http://wildsau.idv.uni-linz.ac.at/mfx/lzo.html</CODE>.
349
If you want to use the <CODE>bzip2</CODE> algorithms to compress small blocks
350
of data, 64k bytes or smaller, for example on an on-the-fly disk
351
compressor, you'd be well advised not to use this library. Instead,
352
I've made a special library tuned for that kind of use. It's part of
353
<CODE>e2compr-0.40</CODE>, an on-the-fly disk compressor for the Linux
354
<CODE>ext2</CODE> filesystem. Look at
355
<CODE>http://www.netspace.net.au/~reiter/e2compr</CODE>.
360
<TABLE CELLPADDING=1 CELLSPACING=1 BORDER=0>
361
<TR><TD VALIGN="MIDDLE" ALIGN="LEFT">[<A HREF="manual_4.html#SEC47"> < </A>]</TD>
362
<TD VALIGN="MIDDLE" ALIGN="LEFT">[<A HREF="manual_4.html#SEC49"> > </A>]</TD>
363
<TD VALIGN="MIDDLE" ALIGN="LEFT"> <TD VALIGN="MIDDLE" ALIGN="LEFT">[ << ]</TD>
364
<TD VALIGN="MIDDLE" ALIGN="LEFT">[<A HREF="manual.html#SEC_Top"> Up </A>]</TD>
365
<TD VALIGN="MIDDLE" ALIGN="LEFT">[ >> ]</TD>
366
<TD VALIGN="MIDDLE" ALIGN="LEFT"> <TD VALIGN="MIDDLE" ALIGN="LEFT"> <TD VALIGN="MIDDLE" ALIGN="LEFT"> <TD VALIGN="MIDDLE" ALIGN="LEFT"> <TD VALIGN="MIDDLE" ALIGN="LEFT">[<A HREF="manual.html#SEC_Top">Top</A>]</TD>
367
<TD VALIGN="MIDDLE" ALIGN="LEFT">[<A HREF="manual_toc.html#SEC_Contents">Contents</A>]</TD>
368
<TD VALIGN="MIDDLE" ALIGN="LEFT">[Index]</TD>
369
<TD VALIGN="MIDDLE" ALIGN="LEFT">[<A HREF="manual_abt.html#SEC_About"> ? </A>]</TD>
371
<H2> 4.5 Testing </H2>
372
<!--docid::SEC48::-->
375
A record of the tests I've done.
378
First, some data sets:
380
<LI>B: a directory containing 6001 files, one for every length in the
381
range 0 to 6000 bytes. The files contain random lowercase
382
letters. 18.7 megabytes.
383
<LI>H: my home directory tree. Documents, source code, mail files,
384
compressed data. H contains B, and also a directory of
385
files designed as boundary cases for the sorting; mostly very
386
repetitive, nasty files. 565 megabytes.
387
<LI>A: directory tree holding various applications built from source:
388
<CODE>egcs</CODE>, <CODE>gcc-2.8.1</CODE>, KDE, GTK, Octave, etc.
391
The tests conducted are as follows. Each test means compressing
392
(a copy of) each file in the data set, decompressing it and
393
comparing it against the original.
396
First, a bunch of tests with block sizes and internal buffer
397
sizes set very small,
398
to detect any problems with the
399
blocking and buffering mechanisms.
400
This required modifying the source code so as to try to
404
buffer size of 1 byte, and block size of 23 bytes.
405
<LI>Data set B, buffer sizes 1 byte, block size 1 byte.
406
<LI>As (2) but small-mode decompression.
407
<LI>As (2) with block size 2 bytes.
408
<LI>As (2) with block size 3 bytes.
409
<LI>As (2) with block size 4 bytes.
410
<LI>As (2) with block size 5 bytes.
411
<LI>As (2) with block size 6 bytes and small-mode decompression.
412
<LI>H with buffer size of 1 byte, but normal block
413
size (up to 900000 bytes).
415
Then some tests with unmodified source code.
417
<LI>H, all settings normal.
418
<LI>As (1), with small-mode decompress.
419
<LI>H, compress with flag <CODE>-1</CODE>.
420
<LI>H, compress with flag <CODE>-s</CODE>, decompress with flag <CODE>-s</CODE>.
421
<LI>Forwards compatibility: H, <CODE>bzip2-0.1pl2</CODE> compressing,
422
<CODE>bzip2-0.9.5</CODE> decompressing, all settings normal.
423
<LI>Backwards compatibility: H, <CODE>bzip2-0.9.5</CODE> compressing,
424
<CODE>bzip2-0.1pl2</CODE> decompressing, all settings normal.
425
<LI>Bigger tests: A, all settings normal.
426
<LI>As (7), using the fallback (Sadakane-like) sorting algorithm.
427
<LI>As (8), compress with flag <CODE>-1</CODE>, decompress with flag
429
<LI>H, using the fallback sorting algorithm.
430
<LI>Forwards compatibility: A, <CODE>bzip2-0.1pl2</CODE> compressing,
431
<CODE>bzip2-0.9.5</CODE> decompressing, all settings normal.
432
<LI>Backwards compatibility: A, <CODE>bzip2-0.9.5</CODE> compressing,
433
<CODE>bzip2-0.1pl2</CODE> decompressing, all settings normal.
434
<LI>Misc test: about 400 megabytes of <CODE>.tar</CODE> files with
435
<CODE>bzip2</CODE> compiled with Checker (a memory access error
436
detector, like Purify).
437
<LI>Misc tests to make sure it builds and runs ok on non-Linux/x86
440
These tests were conducted on a 225 MHz IDT WinChip machine, running
441
Linux 2.0.36. They represent nearly a week of continuous computation.
442
All tests completed successfully.
447
<TABLE CELLPADDING=1 CELLSPACING=1 BORDER=0>
448
<TR><TD VALIGN="MIDDLE" ALIGN="LEFT">[<A HREF="manual_4.html#SEC48"> < </A>]</TD>
449
<TD VALIGN="MIDDLE" ALIGN="LEFT">[ > ]</TD>
450
<TD VALIGN="MIDDLE" ALIGN="LEFT"> <TD VALIGN="MIDDLE" ALIGN="LEFT">[ << ]</TD>
451
<TD VALIGN="MIDDLE" ALIGN="LEFT">[<A HREF="manual.html#SEC_Top"> Up </A>]</TD>
452
<TD VALIGN="MIDDLE" ALIGN="LEFT">[ >> ]</TD>
453
<TD VALIGN="MIDDLE" ALIGN="LEFT"> <TD VALIGN="MIDDLE" ALIGN="LEFT"> <TD VALIGN="MIDDLE" ALIGN="LEFT"> <TD VALIGN="MIDDLE" ALIGN="LEFT"> <TD VALIGN="MIDDLE" ALIGN="LEFT">[<A HREF="manual.html#SEC_Top">Top</A>]</TD>
454
<TD VALIGN="MIDDLE" ALIGN="LEFT">[<A HREF="manual_toc.html#SEC_Contents">Contents</A>]</TD>
455
<TD VALIGN="MIDDLE" ALIGN="LEFT">[Index]</TD>
456
<TD VALIGN="MIDDLE" ALIGN="LEFT">[<A HREF="manual_abt.html#SEC_About"> ? </A>]</TD>
458
<H2> 4.6 Further reading </H2>
459
<!--docid::SEC49::-->
460
<CODE>bzip2</CODE> is not research work, in the sense that it doesn't present
461
any new ideas. Rather, it's an engineering exercise based on existing
465
Four documents describe essentially all the ideas behind <CODE>bzip2</CODE>:
466
<TABLE><tr><td> </td><td class=example><pre>Michael Burrows and D. J. Wheeler:
467
"A block-sorting lossless data compression algorithm"
469
Digital SRC Research Report 124.
470
ftp://ftp.digital.com/pub/DEC/SRC/research-reports/SRC-124.ps.gz
471
If you have trouble finding it, try searching at the
472
New Zealand Digital Library, http://www.nzdl.org.
474
Daniel S. Hirschberg and Debra A. LeLewer
475
"Efficient Decoding of Prefix Codes"
476
Communications of the ACM, April 1990, Vol 33, Number 4.
477
You might be able to get an electronic copy of this
478
from the ACM Digital Library.
481
Program bred3.c and accompanying document bred3.ps.
482
This contains the idea behind the multi-table Huffman
484
ftp://ftp.cl.cam.ac.uk/users/djw3/
486
Jon L. Bentley and Robert Sedgewick
487
"Fast Algorithms for Sorting and Searching Strings"
488
Available from Sedgewick's web page,
489
www.cs.princeton.edu/~rs
490
</pre></td></tr></table>The following paper gives valuable additional insights into the
491
algorithm, but is not immediately the basis of any code
493
<TABLE><tr><td> </td><td class=example><pre>Peter Fenwick:
494
Block Sorting Text Compression
495
Proceedings of the 19th Australasian Computer Science Conference,
496
Melbourne, Australia. Jan 31 - Feb 2, 1996.
497
ftp://ftp.cs.auckland.ac.nz/pub/peter-f/ACSC96paper.ps
498
</pre></td></tr></table>Kunihiko Sadakane's sorting algorithm, mentioned above,
500
<TABLE><tr><td> </td><td class=example><pre>http://naomi.is.s.u-tokyo.ac.jp/~sada/papers/Sada98b.ps.gz
501
</pre></td></tr></table>The Manber-Myers suffix array construction
502
algorithm is described in a paper
504
<TABLE><tr><td> </td><td class=example><pre>http://www.cs.arizona.edu/people/gene/PAPERS/suffix.ps
505
</pre></td></tr></table>Finally, the following paper documents some recent investigations
506
I made into the performance of sorting algorithms:
507
<TABLE><tr><td> </td><td class=example><pre>Julian Seward:
508
On the Performance of BWT Sorting Algorithms
509
Proceedings of the IEEE Data Compression Conference 2000
510
Snowbird, Utah. 28-30 March 2000.
511
</pre></td></tr></table></P><P>
514
<TABLE CELLPADDING=1 CELLSPACING=1 BORDER=0>
515
<TR><TD VALIGN="MIDDLE" ALIGN="LEFT">[ << ]</TD>
516
<TD VALIGN="MIDDLE" ALIGN="LEFT">[ >> ]</TD>
517
<TD VALIGN="MIDDLE" ALIGN="LEFT"> <TD VALIGN="MIDDLE" ALIGN="LEFT"> <TD VALIGN="MIDDLE" ALIGN="LEFT"> <TD VALIGN="MIDDLE" ALIGN="LEFT"> <TD VALIGN="MIDDLE" ALIGN="LEFT"> <TD VALIGN="MIDDLE" ALIGN="LEFT">[<A HREF="manual.html#SEC_Top">Top</A>]</TD>
518
<TD VALIGN="MIDDLE" ALIGN="LEFT">[<A HREF="manual_toc.html#SEC_Contents">Contents</A>]</TD>
519
<TD VALIGN="MIDDLE" ALIGN="LEFT">[Index]</TD>
520
<TD VALIGN="MIDDLE" ALIGN="LEFT">[<A HREF="manual_abt.html#SEC_About"> ? </A>]</TD>
524
This document was generated
525
by <I>Julian Seward</I> on <I>January, 5 2002</I>
526
using <A HREF="http://www.mathematik.uni-kl.de/~obachman/Texi2html
527
"><I>texi2html</I></A>