1
Part 1 of the \href{http://www.jpeg.org/jpeg2000/}{JPEG2000} standard describes
2
a core compression system that is based on the dyadic
3
\href{http://en.wikipedia.org/wiki/Discrete_wavelet_transform}
4
{DWT (Discrete Wavelet Transform)}
6
EBCOT (Embedded Block Coding with Optimal Truncation). Some features of this
7
compression system are high compression ratios, error-resilience, lossless
8
and lossy compression, random access to the compressed stream, resolution and
9
quality scalability, and support for multiple components.
10
These characteristics make it ideal for the coding and retrieving of
13
\subsection{Data partitions}
15
The JPEG2000 standard defines a wide variety of partitions
16
for the image data, with the aim of exploiting at the maximum the
17
offered scalability. All of these partitions
18
allow an efficient manipulation of the image, or a part of it. Fig.
19
\ref{fig:partitions} shows a graphical example of the main partitions.
23
\resizebox{0.95\textwidth}{!}{\input{../partition}}
25
\caption{Data partition defined by the JPEG2000 standard.}
26
\label{fig:partitions}
29
In order to understand the concept of each partition defined
30
in the JPEG2000 standard, it is necessary to clarify the concept
31
of canvas. The canvas is a bidimensional drawing zone where
32
all the partitions are mapped to form the related image.
33
Hereinafter, all the used coordinates are in relation to
34
a canvas, which size, width ($I_{2}$) and height ($I_{1}$), corresponds to
35
the total size of the associated image. Each partition is
36
located and mapped over the canvas in a specific way.
37
An image is composed by one or more components. In the most
38
of the cases the images have only three components: red, green
39
and blue (RGB), with a size equals to the canvas.
41
The JPEG2000 standard allows to divide an image into smaller
42
rectangular regions called tiles. Each tile is compressed
43
independently in relation to the rest, hence the compression
44
parameters can be different among them. By default there is
45
always one tile as minimum, which equals to the whole image.
47
One of the possible applications of the tile partitioning is its use
48
with images that contain different elements and visually separated,
49
like text, graphics or photographic materials. When this does not
50
occur, and the images are continuous and homogeneous, the tiling
51
is not recommended because it produces artifacts in the borders of
52
the tiles, causing a mosaic effect. Moreover, the size of a
53
compressed image is larger when the tiles are used.
55
The DWT transform and all the quantification/coding stages
56
are applied independently to each tile-component. A tile-component,
57
of a tile $t$ and a component $c$, is defined by a bidimensional
58
zone limited by $t$ taking into account the zone occupied by
59
$c$. This means that, if an
60
image has only one tile, with three color components, there are
61
three tile-components, which are compressed independently.
63
For each tile-component, identified by the tile $t$ and the component
64
$c$, there are a total of $D_{t,c} + 1$
65
resolutions, where $D_{t,c}$ is the number
66
of DWT stages applied. The $r$-nth resolution level of a compressed
67
tile-component is obtained after applying $r$ times the inverse DWT
69
$r$ value is in the range of $0 \leq r \leq D_{t,c}$.
71
Each tile-component, after being applied the DWT, is
72
divided into code-blocks, that are coded independently.
73
In each resolution $r$ of each tile-component $(t,c)$, the
74
code-blocks are grouped in precincts. This partition is
75
defined by the height, $P_{1}^{t,c,r}$,
76
and the width, $P_{2}^{t,c,r}$,
77
of each precinct. The number of precincts in vertical,
79
as well as in horizontal, $N_{2}^{P,t,c,r}$ are given by
80
the following expression:
83
N_{i}^{P,t,c,r} = \left\lceil \frac{I_{i}}{2^{D_{t,c}-r}P_{i}^{t,c,r}} \right\rceil
86
Code-blocks refer to the wavelet coefficients generated by the DWT
87
transform, thus rectangular regions within the wavelet
88
domain. However, precincts refer to rectangular regions within the
89
image domain. The spatial scalability offered by the standard is
90
carried out with the precincts.
92
The packet is the fundamental unit for the organization
93
of the compressed bit-stream of an image. Each precinct contributes
94
to the bit-stream as many packets as quality layers there are. The
95
compressed data of each code-block is divided in different segments
96
called quality layers. All the code-blocks of all the precincts
97
of the same tile are divided into the same number of quality layers,
98
although the length of the quality layers between code-blocks can be
99
different (the length can be even zero). For a certain layer $l$,
100
the set of all the layer $l$ of all the code-blocks related to a
101
precinct form a packet.
103
In order to decode a certain region of an image it is necessary
104
decode all the packets related to that region. In the server code,
105
the class \hyperlink{classjpip_1_1WOIComposer}{jpip::WOIComposer} allows to know,
106
for a given region of interest, hereinafter called WOI (Window
107
Of Interest), all the required packets to decode it.
109
A packet $\zeta_{t,c,r,p,l}$
110
is identified by the tile $t$, the
111
component $c$, the resolution $r$, the precinct $p$ (in precinct
113
and the quality layer $l$. In the server code, the class
114
\hyperlink{classjpeg2000_1_1Packet}{jpeg2000::Packet} is used to
117
\subsection{Code-stream organization}
119
Part 1 of the JPEG2000 standard defines a basic structure for
120
organizing the image compressed data into code-streams. A code-stream
121
includes all the packets generated by a compression process of an image
122
plus a set of markers, that are used for signaling certain parts, as
123
well as for including information necessary for the decompression.
125
The code-stream is itself a simple file format for JPEG2000 image.
126
Any standard decompressor must be able to understand a code-stream
127
stored within a file. This basic format is also called raw, and
128
its most used extension is ``.J2C''.
130
The markers have an unique identifier, that consists of an unsigned
131
integer of $16$ bits. These markers can be found alone, that is,
132
only the identifier, or accompanied by additional information,
133
receiving in this case the name of marker segment.
135
The marker segment has, after the identifier, another unsigned
136
integer of $16$ bits with the length of the included data, including
137
as well the two bytes of this integer, but without counting the
138
two bytes of the identifier.
140
The code-stream always begins with the SOC (Start Of Code-stream)
141
marker, which does not include any additional information.
142
After this marker a set of markers called ``main header'' begins.
143
After the SOC marker there is always a SIZ marker, with global
144
information necessary for decompressing the data, e.g. the image
145
size, the tile size, the anchor point of the tiles, the number
146
of components, the sub-sampling factors, etc.
148
There are another two markers that are mandatory in the main header:
149
COD, with information related to the coding of the image, like the
150
number of layers, number of DWT stages, the size of the code-blocks,
151
the progression, etc.; and QCD, which contains the quantization
152
parameters. These two markers can be stored in any position within the
155
The rest of the code-stream, until the EOC (End Of Code-stream),
156
located just at the end of it, is organized as it is shown
157
in Fig. \ref{fig:code-stream}. For each image tile, there is
158
a set of data. This data is divided into one or more tile-parts.
159
Each tile-part is composed by a header and a set of packets.
160
The header of the first tile-part is the main header of the tile.
161
The header of each tile-part begins with the SOT (Start Of Tile)
162
marker and ends with the SOD (Start Of Data) marker, starting then
163
the related sequence of packets, according to the last COD or POC
164
marker. The main header ends when the first SOT is found.
168
\resizebox{0.65\textwidth}{!}{\input{../codestream}}
170
\caption{Code-stream organization.}
171
\label{fig:code-stream}
174
In order to permit a random access to the data of a code-stream,
175
that by default is not feasible, JPEG2000 offers the possibility
176
of including the TLM, PLM and/or PLT markers. The TLM and PLM markers
177
are included within the main header, whilst the PLT marker goes
178
in the header of a tile or tile-part. The goal of the TLM marker
179
is to store the length of each tile-part that appear within the
180
code-stream. This length includes the header as well
181
as the set of packets, so for knowing where is the beginning of the
182
data it is necessary to analyze firstly the header. The PLM marker
183
stores the length of each packet of each tile-part of the code-stream.
184
Each packet of the code-stream has a certain length, that can
185
not be known a priori. Therefore including this marker facilitates
186
a random access of the packets. The PLT marker has the same function
187
as the PLM marker, but at the level of tile-part, thus it stores
188
the length of all the packets of the belonging tile-part. This
189
marker is commonly most used than PLM.
191
The PLM and PLT markers produces an increase of the code-stream
192
length, although the way of coding the packet lengths helps to
193
avoid an excessive overhead: a length $L$ of a certain packet,
194
that can be represented with $B_{L}$ bits, is stored coded
195
with $\left \lceil \frac{B_{L}}{7} \right \rceil$ bytes. For a
196
length $L$ is generated a sequence of bytes where only the
197
less significant $7$ bits are used. The most significant
198
bit of each byte indicates if the belonging byte is ($1$) or
199
not ($0$) the last one of the sequence. This way of numeric
200
encoding is widely used in Part 9 of the standard, specially with
201
the JPIP protocol. With this protocol, to each variable sequence
202
of bytes that represents a number encoded in this way is called
203
VBAS (Variable Byte-Aligned Segment). The class
204
\hyperlink{classjpip_1_1DataBinWriter}{jpip::DataBinWriter}, within the server code,
205
contains methods to generate VBAS coded values.
207
\subsection{Progressions}
208
\label{sec:progresiones}
210
The packets generated by the JPEG2000 compression process
211
are neither independent nor self-contained. Having a certain packet,
212
it is not possible to figure out to which part of the related image
213
it belongs without additional information. The length of the packet
214
can not be determined before being decoded, and many packets can not
215
be decoded without decoding other packets before. This is why it is
216
necessary to include markers like TLM, PLT or PLM, previously
217
commented, in order to allow a random access without decoding.
219
The packets of each tile-part appear according the progression
220
specified by the last COD or POC marker read, before the
221
SOD marker. Part 1 of the JPEG2000 standard defines 5 possible kinds of
222
progressions for ordering the packets within a tile or tile-part.
223
Each progression is identified by means of a combination
224
of four letters: ``L'' for quality layer, ``R'' for resolution
225
level, ``C'' for component and ``P'' for precinct. Each letter identifies
226
the partition of the progression. Hence for the LRCP progression,
227
for example, the packets would be included as follows:\\
228
\\for each layer $l$\\
229
\hspace*{1cm} for each resolution $r$\\
230
\hspace*{2cm} for each component $c$\\
231
\hspace*{3cm} for each precinct $p$\\
232
\hspace*{4cm} include the packet $\zeta_{t,c,r,p,l}$\\
234
The different progressions allowed by the standard are: LRCP, RLCP,
235
RPCL, PCRL and CPRL. To choose a progression or another depends on
237
develop, and how the packet must to be decoded. For example,
238
if the packets are going to be accessed randomly, but
239
as minimum disk accesses as possible are required, RPCL would
240
be the ideal progression in this case.
241
In the case of image transmission, the packets must also follow a
242
specific order or progression when they are transmitted.
243
When an image is transmitted from a server to a client, the most
244
desired goal is to allow the client to be able to show reconstructions
245
of the image with a quality that is increased as the data is received.
246
The quality of the reconstruction must be always the maximum possible
247
according to the received data. Under this criteria, the LRCP progression
248
can be confirmed as the best one, and it is the progression used by
249
the class \hyperlink{classjpip_1_1WOIComposer}{jpip::WOIComposer}.
251
\subsection{File formats}
253
Although the code-stream is completely functional as a basic
254
file format, it does not allow to include additional information
255
that could be necessary in certain applications, e.g. meta-data,
256
copyright information, or color palettes. By means of the COM
257
marker auxiliary information can be included within a
258
code-stream, but it is not classified nor organized in
261
Part 1 of the standard also defines a file format based on ``boxes'' that
262
allows to include, for example, in the same file, several code-streams
263
and diverse information correctly identified. These files usually have
264
the extension ``.JP2'', extension also used for calling this kind
267
The JP2 files are easily extensible. A basic structure of box is defined,
268
which can contains any kind of information. Each box is unequivocally
269
classified by means of a $4$-bytes identifier. A file can contain
270
several boxes with the same identifier. The standard proposes an
271
initial set of boxes, that may be extended according to specific
272
requirements. In fact, the JP2 format is the base of the rest of
273
formats and extensions defined in the rest of parts of the standard.
275
Each box has a header of $8$ bytes. The first $4$ bytes, $L$, form
276
an unsigned integer with the length in bytes of the content of the
277
next $4$ bytes, $T$, contain the identifier of the kind of box. This
278
identifier is commonly treated like a string of $4$ ASCII characters.
279
The value of $L$ includes the header, hence the real length of the
280
content of the box is $L - 8$. $L$ can have any value bigger or
281
equal to $8$, but also $1$ or $0$. If $L = 1$ the length of the
282
content of the box is coded as an unsigned integer of $8$ bytes, $X$,
283
located after $T$. In this case the header occupies $16$ bytes and
284
the length of the content is then $X - 16$. If $L = 0$ the
285
length of the box content is undefined, being possible only for the
286
last box of the image file.
288
Boxes can contain another boxes inside. It is possible to know
289
whether a box contains or not sub-boxes depending on the value of
290
$T$. If a box contains sub-boxes, it only can contain sub-boxes,
291
so it can not combine sub-boxes with other data.
293
Within the server code, the class \hyperlink{classjpeg2000_1_1FileManager}
294
{jpeg2000::FileManager} contains all the necessary code to read and parse
295
JPEG2000 image files, from simple raw J2C files to complex JPX ones with
296
hyperlinks. When this class parses an image file, extract the associated
297
index information and stores it in an object of the class
298
\hyperlink{classjpeg2000_1_1ImageInfo}{jpeg2000::ImageInfo}.