4
* Title : Parallel BZIP2 (pbzip2)
6
* Author: Jeff Gilchrist (http://gilchrist.ca/jeff/)
4
* Title : Parallel BZIP2 (pbzip2)
6
* Author: Jeff Gilchrist (http://gilchrist.ca/jeff/)
7
7
* - Modified producer/consumer threading code from
8
8
* Andrae Muys <andrae@humbug.org.au.au>
9
9
* - uses libbzip2 by Julian Seward (http://sources.redhat.com/bzip2/)
10
10
* - Major contributions by Yavor Nikolov <nikolov.javor+pbzip2@gmail.com>
16
* - direct decompress: (bzerr == BZ_DATA_ERROR_MAGIC) - on rewrite mode
17
* is handled as cat which is counter-intuitive (though similar to bzip2 handling).
18
* - some functions are too-long -> harder to maintain (e.g. main)
22
* Bryan Stillwell <bryan@bokeoa.com> - code cleanup, RPM spec, prep work
23
* for inclusion in Fedora Extras
24
* Dru Lemley [http://lemley.net/smp.html] - help with large file support
25
* Kir Kolyshkin <kir@sacred.ru> - autodetection for # of CPUs
26
* Joergen Ramskov <joergen@ramskov.org> - initial version of man page
27
* Peter Cordes <peter@cordes.ca> - code cleanup
28
* Kurt Fitzner <kfitzner@excelcia.org> - port to Windows compilers and
29
* decompression throttling
30
* Oliver Falk <oliver@linux-kernel.at> - RPM spec update
31
* Jindrich Novy <jnovy@redhat.com> - code cleanup and bug fixes
32
* Benjamin Reed <ranger@befunk.com> - autodetection for # of CPUs in OSX
33
* Chris Dearman <chris@mips.com> - fixed pthreads race condition
34
* Richard Russon <ntfs@flatcap.org> - help fix decompression bug
35
* Paul Pluzhnikov <paul@parasoft.com> - fixed minor memory leak
36
* Aníbal Monsalve Salazar <anibal@debian.org> - creates and maintains Debian packages
37
* Steve Christensen - creates and maintains Solaris packages (sunfreeware.com)
38
* Alessio Cervellin - creates and maintains Solaris packages (blastwave.org)
39
* Ying-Chieh Liao - created the FreeBSD port
40
* Andrew Pantyukhin <sat@FreeBSD.org> - maintains the FreeBSD ports and willing to
41
* resolve any FreeBSD-related problems
42
* Roland Illig <rillig@NetBSD.org> - creates and maintains NetBSD packages
43
* Matt Turner <mattst88@gmail.com> - code cleanup
44
* Álvaro Reguly <alvaro@reguly.com> - RPM spec update to support SUSE Linux
45
* Ivan Voras <ivoras@freebsd.org> - support for stdin and pipes during compression and
47
* John Dalton <john@johndalton.info> - code cleanup and bug fixes for stdin support
48
* Rene Georgi <rene.georgi@online.de> - code and Makefile cleanup, support for direct
49
* decompress and bzcat
50
* René Rhéaume & Jeroen Roovers <jer@xs4all.nl> - patch to support uclibc's lack of
51
* a getloadavg function
52
* Reinhard Schiedermeier <rs@cs.hm.edu> - support for tar --use-compress-prog=pbzip2
53
* Elbert Pol - creates and maintains OS/2 packages
54
* Nico Vrouwe <nico@gojelly.com> - support for CPU detection on Win32
55
* Eduardo Terol <EduardoTerol@gmx.net> - creates and maintains Windows 32bit package
56
* Nikita Zhuk <nikita@zhuk.fi> - creates and maintains Mac OS X Automator action and
58
* Jari Aalto <jari.aalto@cante.net> - Add long options to -h output.
59
* Add --loadavg, --read long options.
60
* Scott Emery <emery@sgi.com> - ignore fwrite return and pass chown errors in
61
* writeFileMetaData if effective uid root
62
* Steven Chamberlain <steven@pyro.eu.org> - code to support throttling compression to
63
* prevent memory exhaustion with slow output
65
* Yavor Nikolov <nikolov.javor+pbzip2@gmail.com> - code to support throttling compression to
66
* prevent memory exhaustion with slow output, cleanup of debug output
67
* - fixed infinite loop on when fileWriter fails to create output file
69
* - allDone renamed to producerDone and added mutex synchronized-access
70
* - Changed fileWriter loop exit condition: now protected from
72
* - Mutex initialization/disposal refactored
73
* - Throttling loops using thread condition wait
74
* - Fatal error handling refactored
75
* - Removed allDone checks used to signal error (now handled by
76
* handle_error function)
77
* - Prevented dangling threads on switch from Multi to Single threaded
78
* - Inline hint added on a few functions
79
* - Some additional error_handlers placed instead of returns (kill any
81
* - Cleanup and termination changed in attempt to prevent
82
* signal-handling issues in mulit-threaded environment (still some
83
* problems are observed on signalling e.g. with Ctrl+C)
84
* - Signal-handling in child threads disabled. The goal is to have
85
* single thread only which accepts signals
86
* - Using abort instead of exit on error termination
87
* - Fixed command-line parsing problem (e.g. -m100 -p12 -> 120 CPUs)
88
* (Problem was unterminated strings afer strncpy).
89
* - Signal handlers setup refactored to separate function and
90
* switched from signal to sigaction as per POSIX recommendations
91
* - Added mutexes unlocking before error-termination.
92
* - Termination flag introduced (terminateFlag) to indicate abrupt
93
* termination and facilitate thread finishing in error conditon.
94
* - fileWriter: error_handler instead of exit on write error.
95
* - percentComplete progress printed only if changed.
96
* - signal handling redesigned: using sigwait in separate thread.
97
* - Makefile: -D_POSIX_PTHREAD_SEMANTICS (used in Solaris).
98
* - CHAR_BIT instead of 8 used in a warning message.
99
* - SIGUSR1 signal handling added and used to terminate signal handling
100
* thread. (Resolved issue with pthread_cancel on Windows-Cygwin)
101
* - Fixed wrongly issued exit code 1 instead of 0.
102
* - Corrected some error messages and added a few new ones at signal and
103
* terminator threads join.
104
* - Added support for thread stack size customization (-S# option)
105
* Needs USE_STACKSIZE_CUSTOMIZATION to be defined to enable that option
106
* - Added define of PTHREAD_STACK_MIN if such is not available in
108
* - OutputBuffer usage redesigned as fixed-size circular buffer. Adding
109
* new elements to it refactored as separate function.
110
* - OutputBuffer resizing removed from producer_decompress since now
111
* buffer should be with fixed size.
112
* - Fixed debug print of OutputBuffer now referencing OutputBuffer in
113
* old-style absolute index (in fileWriter and others).
114
* - memstr function implementation simplified (delegated to standard
115
* library function which is doing the same more efficiently).
116
* - Changed some variables from int to size_t to get rid of compiler
117
* warnings (signed + unsigned expressions).
118
* - Sequential processing of input file/pipe/redirect implemented (capsulated
119
* as separate class: BZ2StreamScanner)
120
* - Parallel decompression enabled (now possible with the sequential in)
121
* - Refactored declarations moved to separate header file (pbzip2.h) to
122
* make global definitions available to other source modules
123
* - Progress reporting modified since we don't have number of
124
* blocks up-front with sequential input read (now based on bytes). fileSize
125
* moved as InFileSize global variable for that purpose
126
* - Progress computation in fileWriter moved to QuietMode != 0
127
* (not needed to do it if we won't print it)
128
* - disposeMemory helper function implemented to ease memory disposal
129
* - Processing functions of threads declared as extern "C" since pthread_t
130
* requires plain "C" calling convention instead of the default "C++"
131
* - pthread_mutex_{lock|unlock} replaced with safe_mutex_{lock|unlock}
132
* where appropriate (to prevent from issues like out of sys mutexes)
133
* - Makefile modified to include the new source files for BZ2StreamScanner
134
* - Makefile refined (library flags specified in LDFLAGS variable)
135
* - Makefile.solaris.sunstudio included as example makefile for Solaris
136
* and SunStudio 12 C++ compiler
137
* - bz2HeaderZero in main initialized to value 0x90 > 127 which is in general
138
* out of char type range. Changed to unsigned along with tmpBuff to avoid
139
* some compiler(e.g. c++0x)/runtime warnings/errors.
140
* - Some thread conditions signalling added on termination requested to ease
141
* termination of blocked on conditions threads
142
* - Other pthread_* calls (signal, wait) migrated to safe_* wrappers to
143
* handle error return codes (and simplify code where already handled)
144
* - Timed pthread cond waits refactored to separate function and moved to
145
* debug sections only; non-timed wait used in non-debug mode. Signalling
146
* consitions to wake threads waiting on these conditions guaranteed.
147
* - memstr function templetized to allow working with other data types but
148
* not only char * (e.g. unsigned char *)
149
* - safe_cond_broadcast implemented and additional signalling added at
150
* fileWriter end to prevent consumers blocking at end.
151
* - Signal error when the input file doesn't contain any bzip2 headers.
152
* - Fixed problems with not-handling zero-file length special header on compression
154
* - Signalling error on stdin decompression when file doesn't start with
155
* correct bzip2 magic header.
156
* - Implemented outputBufferInit(size_t size) utility function for output
157
* buffer initialization/resetting.
158
* - Plain C headers moved to extern "C" section.
159
* - Modified file-names handling to avoid issues with file-sizes > 2040
160
* - Fixed out of array pointer for OutFilename in strncasecmp calls
161
* - A few other minor modifications
162
* - consumer_decompress using low-level API now to improve performance of
164
* - Fixed issue in safe_cond_timed_wait which caused segmentation fault
165
* when compiled in DEBUG mode
166
* - Handle decompression of very long bz2 streams incrementally instead of
167
* loading whole streams in memory at once
168
* - Progress calculation changed: fixed issue when large file support is
169
* disabled and enabled monitoring of segmented long bzip2 streams
170
* - Fixed issue with Sun Studio compiler - required explicit declaration
171
* of static const members in .cpp.
172
* - consumer_decompress throttling loosed a bit to prevent potential
173
* deadlock/infinite loop in certain situations. (Addition to all-empty-block
174
* tails in OutputBuffer is non-blocking now).
175
* - fixed error message for block size range (max size was wrong)
176
* - consumer_decompress: fixed bug which caused hang while decompressing
177
* prematurely truncated bzip2 stream.
178
* - modified fileWriter to prevent from throttling when output buffers are full
179
* (condition signalling added when block is ready to wake up sleeping writer early)
180
* - Fixed deadlock bug possible with stuck consumers waiting for other one
181
* on long multi-segment sequence.
182
* - Resolved performance issue: all have been waiting for any consumer
183
* working on long-sequence until it's finished even when there were enough
184
* free slots in the input queue.
185
* - Debug print bug fixed in queue::remove.
186
* - Debuging and error handling improvements and refactoring.
187
* - Fixed hang on decompress of some truncated archives (bug #590225).
188
* - Implemented --ignore-trailing-garbage feature (bug #594868)
189
* - Fixed hang on decompress of some truncated archives (bug #590225)
190
* - Fixed hang on decompress with --ignore-trailing-garbage=1 and higher
191
* numCPU (e.g. > 2) (bug #740502)
192
* - Default extension on decompress of .tbz2 changed to .tar for
193
* bzip2 compatibility (bug #743639)
194
* - Print trailing garbage errors even when in quiet mode (bug #743635)
195
* - Fixed hang on decompress with --ignore-trailing-garbage=1 when
196
* producer is interrupted on trailing garbage (bug #762464)
197
* - Fixed excessive output permissions while compress/decompress
198
* is in progress (bug #807536)
199
* - Prevent deletion of input files on error (bug #874543)
200
* - Add more detailed kernel error messages - inspired by
201
* Gordon's patch (bug #874605)
202
* - Error-handling improvements - mainly for multi-archive
203
* scenarios (bug #883782)
204
* - Fixed occasional failure on decompress with --ignore-trailing-garbage=1
205
* with multiple bad blocks in the archive (bug #886625)
206
* - Fixed refusal to write to stdout on -dc from stdin (bug #886628)
207
* - Fix of metadata unpreserved on empty files compress (bug #1011021)
208
* David James - provided patch to fix deadlock due to unsynchronized broadcast (bug #876686)
209
* Gordon - provided patch for improving I/O error messages (bug #874605)
213
* Specials thanks for suggestions and testing: Phillippe Welsh,
214
* James Terhune, Dru Lemley, Bryan Stillwell, George Chalissery,
215
* Kir Kolyshkin, Madhu Kangara, Mike Furr, Joergen Ramskov, Kurt Fitzner,
216
* Peter Cordes, Oliver Falk, Jindrich Novy, Benjamin Reed, Chris Dearman,
217
* Richard Russon, Aníbal Monsalve Salazar, Jim Leonard, Paul Pluzhnikov,
218
* Coran Fisher, Ken Takusagawa, David Pyke, Matt Turner, Damien Ancelin,
219
* Álvaro Reguly, Ivan Voras, John Dalton, Sami Liedes, Rene Georgi,
220
* René Rhéaume, Jeroen Roovers, Reinhard Schiedermeier, Kari Pahula,
221
* Elbert Pol, Nico Vrouwe, Eduardo Terol, Samuel Thibault, Michael Fuereder,
222
* Jari Aalto, Scott Emery, Steven Chamberlain, Yavor Nikolov, Nikita Zhuk,
223
* Joao Seabra, Conn Clark, Mark A. Haun, Tim Bielawa, Michal Gorny,
224
* Mikolaj Habdank, Christian Kujau, Marc-Christian Petersen, Piero Ottuzzi,
225
* Ephraim Ofir, Laszlo Ersek, Dima Tisnek, Tanguy Fautre.
228
15
* This program, "pbzip2" is copyright (C) 2003-2011 Jeff Gilchrist.