3
A) Hard limits of the Zip archive format:
5
Number of entries in Zip archive: 64 k (2^16 - 1 entries)
6
Compressed size of archive entry: 4 GByte (2^32 - 1 Bytes)
7
Uncompressed size of entry: 4 GByte (2^32 - 1 Bytes)
8
Size of single-volume Zip archive: 4 GByte (2^32 - 1 Bytes)
9
Per-volume size of multi-volume archives: 4 GByte (2^32 - 1 Bytes)
10
Number of parts for multi-volume archives: 64 k (1^16 - 1 parts)
11
Total size of multi-volume archive: 256 TByte (4G * 64k)
13
The number of archive entries and of multivolume parts are limited by
14
the structure of the "end-of-central-directory" record, where the these
15
numbers are stored in 2-Byte fields.
16
Some Zip and/or UnZip implementations (for example Info-ZIP's) allow
17
handling of archives with more than 64k entries. (The information
18
from "number of entries" field in the "end-of-central-directory" record
19
is not really neccessary to retrieve the contents of a Zip archive;
20
it should rather be used for consistency checks.)
22
Length of an archive entry name: 64 kByte (2^16 - 1)
23
Length of archive member comment: 64 kByte (2^16 - 1)
24
Total length of "extra field": 64 kByte (2^16 - 1)
25
Length of a single e.f. block: 64 kByte (2^16 - 1)
26
Length of archive comment: 64 KByte (2^16 - 1)
28
Additional limitation claimed by PKWARE:
29
Size of local-header structure (fixed fields of 30 Bytes + filename
30
local extra field): < 64 kByte
31
Size of central-directory structure (46 Bytes + filename +
32
central extra field + member comment): < 64 kByte
35
In 2001, PKWARE has published version 4.5 of the Zip format specification
36
(together with the release of PKZIP for Windows 4.5). This specification
37
defines new extra field blocks that allow to break the size limits of the
38
standard zipfile structures. In this extended Zip format, the size limits
39
of zip entries (and the complete zip archive) have been extended to
40
(2^64 - 1) Bytes and the maximum number of archive entries to (2^32-1).
41
Currently, these extensions are not supported by Info-ZIP software,
42
but it is planned to provide implementation for selected environments
43
with the next major release. (This may take a while, though.)
45
B) Implementation limits of UnZip:
47
1. Size limits caused by file I/O and decompression handling:
48
Size of Zip archive: 2 GByte (2^31 - 1 Bytes)
49
Compressed size of archive entry: 2 GByte (2^31 - 1 Bytes)
51
Note: On some systems, UnZip may support archive sizes up to 4 GByte.
52
To get this support, the target environment has to meet the following
54
a) The compiler's intrinsic "long" data types must be able to hold
55
integer numbers of 2^32. In other words - the standard intrinsic
56
integer types "long" and "unsigned long" have to be wider than
58
b) The system has to supply a C runtime library that is compatible
59
with the more-than-32-bit-wide "long int" type of condition a)
60
c) The standard file positioning functions fseek(), ftell() (and/or
61
the Unix style lseek() and tell() functions) have to be capable
62
to move to absolute file offsets of up to 4 GByte from the file
64
On 32-bit CPU hardware, you generally cannot expect that a C compiler
65
provides a "long int" type that is wider than 32-bit. So, many of the
66
most popular systems (i386, PowerPC, 680x0, et. al) are out of luck.
67
You may find environment that provide all requirements on systems
68
with 64-bit CPU hardware. Examples might be Cray number crunchers
69
or Compaq (former DEC) Alpha AXP machines.
71
The number of Zip archive entries is unlimited. The "number-of-entries"
72
field of the "end-of-central-dir" record is checked against the "number
73
of entries found in the central directory" modulus 64k (2^16).
75
Multi-volume archive extraction is not supported.
77
Memory requirements are mostly independent of the archive size
79
In general, UnZip needs a fixed amount of internal buffer space
80
plus the size to hold the complete information of the currently
81
processed entry's local header. Here, a large extra field
82
(could be up to 64 kByte) may exceed the available memory
83
for MSDOS 16-bit executables (when they were compiled in small
84
or medium memory model, with a fixed 64kByte limit on data space).
86
The other exception where memory requirements scale with "larger"
87
archives is the "restore directory attributes" feature. Here, the
88
directory attributes info for each restored directory has to be held
89
in memory until the whole archive has been processed. So, the amount
90
of memory needed to keep this info scales with the number of restored
91
directories and may cause memory problems when a lot of directories
92
are restored in a single run.
94
C) Implementation limits of the Zip executables:
96
1. Size limits caused by file I/O and compression handling:
97
Size of Zip archive: 2 GByte (2^31 - 1 Bytes)
98
Compressed size of archive entry: 2 GByte (2^31 - 1 Bytes)
99
Uncompressed size of entry: 2 GByte (2^31 - 1 Bytes),
100
(could/should be 4 GBytes...)
101
Multi-volume archive creation is not supported.
103
2. Limits caused by handling of archive contents lists
105
2.1. Number of archive entries (freshen, update, delete)
106
a) 16-bit executable: 64k (2^16 -1) or 32k (2^15 - 1),
107
(unsigned vs. signed type of size_t)
108
a1) 16-bit executable: <16k ((2^16)/4)
109
(The smaller limit a1) results from the array size limit of
110
the "qsort()" function.)
111
32-bit executables <1G ((2^32)/4)
112
(usual system limit of the "qsort()" function on 32-bit systems)
114
b) stack space needed by qsort to sort list of archive entries
116
NOTE: In the current executables, overflows of limits a) and b) are NOT
119
c) amount of free memory to hold "central directory information" of
120
all archive entries; one entry needs:
121
96 bytes (32-bit) resp. 80 bytes (16-bit)
122
+ 3 * length of entry name
123
+ length of zip entry comment (when present)
124
+ length of extra field(s) (when present, e.g.: UT needs 9 bytes)
125
+ some bytes for book-keeping of memory allocation
128
For systems with limited memory space (MSDOS, small AMIGAs, other
129
environments without virtual memory), the number of archive entries
130
is most often limited by condition c).
131
For example, with approx. 100 kBytes of free memory after loading and
132
initializing the program, a 16-bit DOS Zip cannot process more than 600
133
to 1000 (+) archive entries. (For the 16-bit Windows DLL or the 16-bit
134
OS/2 port, limit c) is less important because Windows or OS/2 executables
135
are not restricted to the 1024k area of real mode memory. These 16-bit
136
ports are limited by conditions a1) and b), say: at maximum approx.
140
2.2. Number of "new" entries (add operation)
141
In addition to the restrictions above (2.1.), the following limits
142
caused by the handling of the "new files" list apply:
144
a) 16-bit executable: <16k ((2^64)/4)
146
b) stack size required for "qsort" operation on "new entries" list.
148
NOTE: In the current executables, the overflow checks for these limits
151
c) amount of free memory to hold the directory info list for new entries;
153
24 bytes (32-bit) resp. 22 bytes (16-bit)
154
+ 3 * length of filename
156
D) Some technical remarks:
158
1. The 2GByte size limit on archive files is a consequence of the portable
159
C implementation of the Info-ZIP programs.
160
Zip archive processing requires random access to the archive file for
161
jumping between different parts of the archive's structure.
162
In standard C, this is done via stdio functions fseek()/ftell() resp.
163
unix-io functions lseek()/tell(). In many (most?) C implementations,
164
these functions use "signed long" variables to hold offset pointers
165
into sequential files. In most cases, this is a signed 32-bit number,
166
which is limited to ca. 2E+09. There may be specific C runtime library
167
implementations that interpret the offset numbers as unsigned, but for
168
us, this is not reliable in the context of portable programming.
170
2. The 2GByte limit on the size of a single compressed archive member
171
is again a consequence of the implementation in C.
172
The variables used internally to count the size of the compressed
173
data stream are of type "long", which is guaranted to be at least
174
32-bit wide on all supported environments.
176
But, why do we use "signed" long and not "unsigned long"?
178
Throughout the I/O handling of the compressed data stream, the
179
sign bit of the "long" numbers is (mis-)used as a kind of overflow
180
detection. In the end, this is caused by the fact that standard C
181
lacks any overflow checking on integer arithmetics and does not
182
support access to the underlying hardware's overflow detection
183
(the status bits, especially "carry" and "overflow" of the CPU's
184
flags-register) in a system-independent manner.
186
So, we "misuse" the most-significant bit of the compressed data
187
size counters as carry bit for efficient overflow/underflow detection.
188
We could change the code to a different method of overflow detection,
189
by using a bunch of "sanity" comparisons (kind of "is the calculated
190
result plausible when compared with the operands"). But, this would
191
"blow up" the code of the "inner loop", with remarkable loss of
192
processing speed. Or, we could reduce the amount of consistency checks
193
of the compressed data (e.g. detection of premature end of stream) to
194
an absolute minimum, at the cost of the programs' stability when
195
processing corrupted data.
197
Summary: Changing the compression/decompression core routines to
198
be "unsigned safe" would require excessive recoding, with little
199
gain on maximum processable uncompressed size (a gain can only be
200
expected for hardly compressable data), but at severe costs on
201
performance, stability and maintainability. Therefore, it is
202
quite unlikely that this will ever happen for Zip/UnZip.
204
Anyway, the Zip archive format is more and more showing its age...
205
The effort to lift the 2GByte limits should be better invested in
206
creating a successor for the Zip archive format and tools.
208
Please report any problems to: Zip-Bugs@lists.wku.edu
210
Last updated: 26 January 2002, Christian Spieler