3
A) Hard limits of the Zip archive format:
3
A1) Hard limits of the Zip archive format (without Zip64 extensions):
5
Number of entries in Zip archive: 64 k (2^16 - 1 entries)
6
Compressed size of archive entry: 4 GByte (2^32 - 1 Bytes)
7
Uncompressed size of entry: 4 GByte (2^32 - 1 Bytes)
8
Size of single-volume Zip archive: 4 GByte (2^32 - 1 Bytes)
9
Per-volume size of multi-volume archives: 4 GByte (2^32 - 1 Bytes)
10
Number of parts for multi-volume archives: 64 k (1^16 - 1 parts)
11
Total size of multi-volume archive: 256 TByte (4G * 64k)
5
Number of entries in Zip archive: 64 Ki (2^16 - 1 entries)
6
Compressed size of archive entry: 4 GiByte (2^32 - 1 Bytes)
7
Uncompressed size of entry: 4 GiByte (2^32 - 1 Bytes)
8
Size of single-volume Zip archive: 4 GiByte (2^32 - 1 Bytes)
9
Per-volume size of multi-volume archives: 4 GiByte (2^32 - 1 Bytes)
10
Number of parts for multi-volume archives: 64 Ki (2^16 - 1 parts)
11
Total size of multi-volume archive: 256 TiByte (4G * 64k)
13
13
The number of archive entries and of multivolume parts are limited by
14
14
the structure of the "end-of-central-directory" record, where the these
19
19
is not really neccessary to retrieve the contents of a Zip archive;
20
20
it should rather be used for consistency checks.)
22
Length of an archive entry name: 64 kByte (2^16 - 1)
23
Length of archive member comment: 64 kByte (2^16 - 1)
24
Total length of "extra field": 64 kByte (2^16 - 1)
25
Length of a single e.f. block: 64 kByte (2^16 - 1)
26
Length of archive comment: 64 KByte (2^16 - 1)
22
Length of an archive entry name: 64 KiByte (2^16 - 1)
23
Length of archive member comment: 64 KiByte (2^16 - 1)
24
Total length of "extra field": 64 KiByte (2^16 - 1)
25
Length of a single e.f. block: 64 KiByte (2^16 - 1)
26
Length of archive comment: 64 KiByte (2^16 - 1)
28
28
Additional limitation claimed by PKWARE:
29
29
Size of local-header structure (fixed fields of 30 Bytes + filename
30
local extra field): < 64 kByte
30
local extra field): < 64 KiByte
31
31
Size of central-directory structure (46 Bytes + filename +
32
central extra field + member comment): < 64 kByte
32
central extra field + member comment): < 64 KiByte
34
A2) Hard limits of the Zip archive format with Zip64 extensions:
35
35
In 2001, PKWARE has published version 4.5 of the Zip format specification
36
36
(together with the release of PKZIP for Windows 4.5). This specification
37
37
defines new extra field blocks that allow to break the size limits of the
38
standard zipfile structures. In this extended "Zip64" format, the limits
39
on the size of zip entries and the size of the complete zip archive are
40
extended to (2^64 - 1) Bytes; the maximum number of archive entries and
41
split volumes are enlarged to (2^64 - 1) respective (2^32 - 1).
42
Currently, these extensions are not yet supported by the released Info-ZIP
43
software. However, new major releases (Zip 3.0 and UnZip 6.0) are under
44
development and will support Zip64 archives on selected environments.
45
(Beta releases are already available for Unix, VMS and Win32.)
38
standard zipfile structures. This extended "Zip64" format enlarges the
39
theoretical limits to the following values:
41
Number of entries in Zip archive: 16 Ei (2^64 - 1 entries)
42
Compressed size of archive entry: 16 EiByte (2^64 - 1 Bytes)
43
Uncompressed size of entry: 16 EiByte (2^64 - 1 Bytes)
44
Size of single-volume Zip archive: 16 EiByte (2^64 - 1 Bytes)
45
Per-volume size of multi-volume archives: 16 EiByte (2^64 - 1 Bytes)
46
Number of parts for multi-volume archives: 4 Gi (2^32 - 1 parts)
47
Total size of multi-volume archive: 2^96 Byte (16 Ei * 4Gi)
49
The Info-ZIP software releases (beginning with Zip 3.0 and UnZip 6.0)
50
support Zip64 archives on selected environments (where the underlying
51
operating system capabilities are sufficient, e.g. Unix, VMS and Win32).
47
53
B) Implementation limits of UnZip:
49
55
1. Size limits caused by file I/O and decompression handling:
50
Size of Zip archive: 2 GByte (2^31 - 1 Bytes)
51
Compressed size of archive entry: 2 GByte (2^31 - 1 Bytes)
53
Note: On some systems, UnZip may support archive sizes up to 4 GByte.
54
To get this support, the target environment has to meet the following
56
a) Without "Zip64" and "LargeFile" extensions:
57
Size of Zip archive: 2 GiByte (2^31 - 1 Bytes)
58
Compressed size of archive entry: 2 GiByte (2^31 - 1 Bytes)
60
b) With "Zip64" enabled and "LargeFile" supported:
61
Size of Zip archive: 8 EiByte (2^63 - 1 Bytes)
62
Compressed size of archive entry: 8 EiByte (2^63 - 1 Bytes)
63
Uncompressed size of entry: 8 EiByte (2^63 - 1 Bytes)
65
Note: On some systems, even UnZip without "LargeFile" extensions enabled
66
may support archive sizes up to 4 GiByte. To get this support, the
67
target environment has to meet the following requirements:
56
68
a) The compiler's intrinsic "long" data types must be able to hold
57
69
integer numbers of 2^32. In other words - the standard intrinsic
58
70
integer types "long" and "unsigned long" have to be wider than
61
73
with the more-than-32-bit-wide "long int" type of condition a)
62
74
c) The standard file positioning functions fseek(), ftell() (and/or
63
75
the Unix style lseek() and tell() functions) have to be capable
64
to move to absolute file offsets of up to 4 GByte from the file
76
to move to absolute file offsets of up to 4 GiByte from the file
66
78
On 32-bit CPU hardware, you generally cannot expect that a C compiler
67
79
provides a "long int" type that is wider than 32-bit. So, many of the
68
80
most popular systems (i386, PowerPC, 680x0, et. al) are out of luck.
69
81
You may find environment that provide all requirements on systems
70
with 64-bit CPU hardware. Examples might be Cray number crunchers
71
or Compaq (former DEC) Alpha AXP machines.
82
with 64-bit CPU hardware. Examples might be Cray number crunchers,
83
Compaq (former DEC) Alpha AXP machines, or Intel/AMD x64 computers.
73
85
The number of Zip archive entries is unlimited. The "number-of-entries"
74
86
field of the "end-of-central-dir" record is checked against the "number
75
of entries found in the central directory" modulus 64k (2^16).
87
of entries found in the central directory" modulus 64k (2^16) (without
88
Zip64 extension) or modulus 2^64 (with Zip64 extensions enabled for
77
Multi-volume archive extraction is not supported.
91
Multi-volume archive extraction is not (yet) supported.
79
93
Memory requirements are mostly independent of the archive size
80
94
and archive contents.
96
110
C) Implementation limits of the Zip executables:
98
112
1. Size limits caused by file I/O and compression handling:
99
Size of Zip archive: 2 GByte (2^31 - 1 Bytes)
100
Compressed size of archive entry: 2 GByte (2^31 - 1 Bytes)
101
Uncompressed size of entry: 2 GByte (2^31 - 1 Bytes),
102
(could/should be 4 GBytes...)
103
Multi-volume archive creation is not supported.
113
a) Without "Zip64" and "LargeFile" extensions:
114
Size of Zip archive: 2 GiByte (2^31 - 1 Bytes)
115
Compressed size of archive entry: 2 GiByte (2^31 - 1 Bytes)
116
Uncompressed size of entry: 2 GiByte (2^31 - 1 Bytes),
117
(could/should be 4 GiBytes...)
119
b) With "Zip64" enabled and "LargeFile" supported:
120
Size of Zip archive: 8 EiByte (2^63 - 1 Bytes)
121
Compressed size of archive entry: 8 EiByte (2^63 - 1 Bytes)
122
Uncompressed size of entry: 8 EiByte (2^63 - 1 Bytes)
124
Multi-volume archive creation now supported in the form of split
125
archives. Currently up to 99,999 splits are supported.
105
127
2. Limits caused by handling of archive contents lists
153
181
c) amount of free memory to hold the directory info list for new entries;
155
24 bytes (32-bit) resp. 22 bytes (16-bit)
183
32 bytes (Zip64), 24 bytes (32-bit) resp. 22 bytes (16-bit)
156
184
+ 3 * length of filename
186
NOTE: For larger systems, the actual usability limits may be more
187
performance issues (how long you want to wait) rather than available
188
memory and other resources.
158
190
D) Some technical remarks:
160
1. The 2GByte size limit on archive files is a consequence of the portable
161
C implementation of the Info-ZIP programs.
192
1. For executables without support for "Zip64" archives and "LargeFile"
193
I/O extensions, the 2GiByte size limit on archive files is a consequence
194
of the portable C implementation used for the Info-ZIP programs.
162
195
Zip archive processing requires random access to the archive file for
163
196
jumping between different parts of the archive's structure.
164
197
In standard C, this is done via stdio functions fseek()/ftell() resp.
165
unix-io functions lseek()/tell(). In many (most?) C implementations,
198
unix-io functions lseek()/tell(). In many (most?) C implementations,
166
199
these functions use "signed long" variables to hold offset pointers
167
into sequential files. In most cases, this is a signed 32-bit number,
168
which is limited to ca. 2E+09. There may be specific C runtime library
200
into sequential files. In most cases, this is a signed 32-bit number,
201
which is limited to ca. 2E+09. There may be specific C runtime library
169
202
implementations that interpret the offset numbers as unsigned, but for
170
203
us, this is not reliable in the context of portable programming.
172
2. The 2GByte limit on the size of a single compressed archive member
205
2. Similarly, for executables without "Zip64" and "LargeFile" support,
206
the 2GiByte limit on the size of a single compressed archive member
173
207
is again a consequence of the implementation in C.
174
208
The variables used internally to count the size of the compressed
175
209
data stream are of type "long", which is guaranted to be at least
178
212
But, why do we use "signed" long and not "unsigned long"?
180
Throughout the I/O handling of the compressed data stream, the
181
sign bit of the "long" numbers is (mis-)used as a kind of overflow
182
detection. In the end, this is caused by the fact that standard C
183
lacks any overflow checking on integer arithmetics and does not
184
support access to the underlying hardware's overflow detection
185
(the status bits, especially "carry" and "overflow" of the CPU's
186
flags-register) in a system-independent manner.
188
So, we "misuse" the most-significant bit of the compressed data
189
size counters as carry bit for efficient overflow/underflow detection.
190
We could change the code to a different method of overflow detection,
191
by using a bunch of "sanity" comparisons (kind of "is the calculated
192
result plausible when compared with the operands"). But, this would
193
"blow up" the code of the "inner loop", with remarkable loss of
194
processing speed. Or, we could reduce the amount of consistency checks
195
of the compressed data (e.g. detection of premature end of stream) to
196
an absolute minimum, at the cost of the programs' stability when
197
processing corrupted data.
199
Summary: Changing the compression/decompression core routines to
200
be "unsigned safe" would require excessive recoding, with little
201
gain on maximum processable uncompressed size (a gain can only be
202
expected for hardly compressable data), but at severe costs on
203
performance, stability and maintainability. Therefore, it is
204
quite unlikely that this will ever happen for Zip/UnZip.
206
The argumentation above is somewhat out-dated. The new releases
207
Zip 3 and UnZip 6 will support archive sizes larger than 4GB on
208
systems where the required underlying support for 64-bit file offsets
209
and file sizes is available from the OS (and the C runtime environment).
210
However, this new support will partially break compatibility with
211
older "legacy" systems. And it should be expected that the portability
212
and readability of the UnZip and Zip code may be reduced due to the
213
extensive use of non-standard language extension needed for 64-bit
214
support on the major target systems.
214
Throughout the I/O handling of the compressed data stream, the sign bit
215
of the "long" numbers is (mis-)used as a kind of overflow detection.
216
In the end, this is caused by the fact that standard C lacks any
217
overflow checking on integer arithmetics and does not support access
218
to the underlying hardware's overflow detection (the status bits,
219
especially "carry" and "overflow" of the CPU's flags-register) in a
220
system-independent manner.
222
So, we "misuse" the most-significant bit of the compressed data size
223
counters as carry bit for efficient overflow/underflow detection. We
224
could change the code to a different method of overflow detection, by
225
using a bunch of "sanity" comparisons (kind of "is the calculated result
226
plausible when compared with the operands"). But, this would "blow up"
227
the code of the "inner loop", with remarkable loss of processing speed.
228
Or, we could reduce the amount of consistency checks of the compressed
229
data (e.g. detection of premature end of stream) to an absolute minimum,
230
at the cost of the programs' stability when processing corrupted data.
232
3. The argumentation above is somewhat out-dated. Beginning with the
233
releases of Zip 3 and UnZip 6, Info-ZIP programs support archive
234
sizes larger than 4GiB on systems where the required underlying
235
support for 64-bit file offsets and file sizes is available from
236
the OS (and the C runtime environment).
238
For executables with support for "Zip64" archive format and "LargeFile"
239
extension, the I/O limits are lifted by applying extended 64-bit off_t
240
file offsets. All limits discussed above are then based on integer
241
sizes of 64 bits instead of 32, this should allow to handle file and
242
archive sizes up to the limits of manufacturable hardware for the
243
foreseeable future. The reduction of the theoretical limits from
244
(2^64 - 1) to (2^63 - 1) because of the throughout use of signed
245
numbers can be neglected with the currently imaginable hardware.
247
However, this new support partially breaks compatibility with older
248
"legacy" systems. And it should be noted that the portability and
249
readability of the UnZip and Zip code has suffered somehow caused
250
by the extensive use of non-standard language extension needed for
251
64-bit support on the major target systems.
216
253
Please report any problems to: Zip-Bugs at www.info-zip.org
218
Last updated: 22 February 2005, Christian Spieler
255
Last updated: 25 May 2008, Ed Gordon
256
02 January 2009, Christian Spieler