1.1.2
by Christian Marillat
Import upstream version 1.9.0 |
1 |
<html> |
2 |
<title>Design and implementation of the new denoiser</title> |
|
3 |
<body bgcolor=ffffff text=000000 link=ff0000 vlink=ff00ff alink=33ff00> |
|
4 |
<center><h1>Design and implementation of the new denoiser</h1></center> |
|
5 |
<p>In theory, the design of the denoiser is pretty simple; getting it |
|
6 |
to perform was the hard part. |
|
7 |
<p>It maintains a list of the last several frames, called <i>reference |
|
8 |
frames</i>. Each reference frame is composed of <i>reference |
|
9 |
pixels</i>, which accumulate the values of several pixels. |
|
10 |
Every time a pixel in one frame is proven to be a moved instance of a |
|
11 |
pixel in another frame, the reference-pixel incorporates its value, |
|
12 |
and produces an average value for the pixel. The oldest reference |
|
13 |
frame, therefore, gets a pretty good idea of the real value of every |
|
14 |
pixel, but of course output is delayed by the number of reference |
|
15 |
frames. |
|
16 |
<p>It compares every pixel in the current frame with all pixels |
|
17 |
in the previous frame, within a given search-radius, and any pixels |
|
18 |
that are equal within the given error tolerance are assumed to be the |
|
19 |
same pixel. It builds contiguous regions of matched pixels, with |
|
20 |
the motion vector that's common to the region. |
|
21 |
<p>If there are too many matches for a particular area of the image, or |
|
22 |
if the largest contiguous match in the area is too large, it's applied |
|
23 |
to the image right then, and then searching continues. Applying a |
|
24 |
region means to flood-fill the region (to make it the largest size |
|
25 |
possible, and to flesh out its borders to pixel accuracy), then |
|
26 |
hooking up the corresponding reference-frame pixels to the new frame |
|
27 |
at their new location, and incorporating the values of all the new |
|
28 |
pixels into the corresponding reference pixels. Doing this before |
|
29 |
the end of searching the frame means the affected areas don't have |
|
30 |
to be part of the search any more, helping to reduce the amount of |
|
31 |
work needed to search the rest of the frame. |
|
32 |
<p>At the end of the frame, matches are applied to the new frame, from |
|
33 |
the largest to the smallest, discounting any areas that have |
|
34 |
already been resolved. Any new-frame pixels not resolved by now |
|
35 |
are considered to be new information, and a new reference-pixel is |
|
36 |
generated for each one. |
|
37 |
<p>The search is not actually done one pixel at a time; it's done in |
|
38 |
terms of pixel groups. An entire pixel-group has to match for any |
|
39 |
match to be found, but all possible pixel-groups are tested (i.e. all |
|
40 |
possible overlaps are checked). Using pixel-groups helps to establish |
|
41 |
a minimum standard for what may be considered a match, in order to |
|
42 |
avoid finding lots of really small (and really useless) matches. |
|
43 |
The flood-fill still extends the matches out to pixel accuracy, |
|
44 |
so the only details that can't be found by motion-detection are the |
|
45 |
ones smaller than a pixel-group, which is not a bad sacrifice for |
|
46 |
performance's sake. |
|
47 |
<p><br>Table of contents: |
|
48 |
<ul> |
|
49 |
<li><a href="#Overview">Overview</a> |
|
50 |
<li><a href="#Implementation">Implementation</a> |
|
51 |
<ul> |
|
52 |
<li><a href="#ImplementationSkipList">SkipList, Set</a> |
|
53 |
<li><a href="#ImplementationRegion2D">Region2D, SetRegion2D, |
|
54 |
BitmapRegion2D</a> |
|
55 |
<li><a href="#ImplementationReferenceFrame">Pixel, ReferencePixel, |
|
56 |
ReferenceFrame</a> |
|
57 |
<li><a href="#ImplementationMotionSearcher">MotionSearcher, |
|
58 |
SearchWindow, SearchBorder</a> |
|
59 |
<ul> |
|
60 |
<li><a href="#ImplementationMotionSearcherVersion0">Version 1</a> |
|
61 |
</ul> |
|
62 |
</ul> |
|
63 |
<li><a href="#FutureExtensions">Future Extensions</a> |
|
64 |
<!-- <li><a href="#XXX">XXX</a> -->
|
|
65 |
</ul> |
|
66 |
||
67 |
<a name="Overview"><h1>Overview</h1> |
|
68 |
<p><tt>main.c</tt> parses command-line options and the YUV stream header. |
|
69 |
<tt>newdenoise.cc</tt> converts between YUV format and the denoiser's |
|
70 |
internal format, and calls the denoiser. <tt>MotionSearcher.hh</tt> is |
|
71 |
the denoiser's top-level file. <tt>SearchWindow.hh</tt> and |
|
72 |
<tt>SearchBorder.hh</tt> are two high-level pieces of the denoiser that |
|
73 |
were broken out into their own classes, for use in other contexts. |
|
74 |
<tt>ReferenceFrame.hh</tt> contains the definitions for pixels, |
|
75 |
reference pixels, and reference frames. |
|
76 |
<tt>Region2D.hh</tt> is the base class for 2-dimensional region classes; |
|
77 |
<tt>SetRegion2D.hh</tt> implements a region using a Set of horizontal |
|
78 |
extents, and <tt>BitmapRegion2D.hh</tt> uses an array of integers to |
|
79 |
implement a bitmap. <tt>Set.hh</tt> is much like the STL "set" class, |
|
80 |
except that it's based on <tt>SkipList.hh</tt>. <tt>Allocator.hh</tt>, |
|
81 |
<tt>DoublyLinkedList.hh</tt>, <tt>Limits.hh</tt>, <tt>Status_t.h</tt>, and |
|
82 |
<tt>TemplateLib.hh</tt> contain other basic definitions, most of which |
|
83 |
should be standardly available; I'm just not sure they're standardly |
|
84 |
available on all supported platforms. |
|
85 |
<p>The denoiser classes are highly templated and highly reconfigurable; |
|
86 |
<tt>newdenoise.cc</tt> uses them in a way suited to YUV420P video. |
|
87 |
Intensity pixels are one 8-bit unsigned integer, color pixels are |
|
88 |
two 8-bit unsigned integers, intensity pixel-groups are 4x2, color |
|
89 |
pixel-groups are 2x2, intensity is denoised separately from color, and |
|
90 |
the search-radius used for color is proportional to the relative size |
|
91 |
of the intensity and color planes (and may, in effect, be |
|
92 |
rectangular). |
|
93 |
<br> |
|
94 |
||
95 |
<a name="Implementation"><h1>Implementation</h1></a> |
|
96 |
<p><tt>newdenoise.cc</tt> gives a good top-level view of how to use the |
|
97 |
denoiser for YUV420P video. Although the top-level class of the |
|
98 |
denoiser is <tt>MotionSearcher</tt>, a small army of classes is |
|
99 |
responsible for implementing all the necessary pieces. |
|
100 |
||
101 |
<a name="ImplementationSkipList"><h2>SkipList, Set</h2></a> |
|
102 |
<p><tt>SkipList</tt> is a highly optimized version of the famous |
|
103 |
probabilistically-balanced logarithmic-complexity sorted list |
|
104 |
structure. Skip lists are well-described in other documents. Note |
|
105 |
that this skip-list uses the "fix the dice" and "search finger" |
|
106 |
extensions described in the literature, and its p value is |
|
107 |
<sup>1</sup>/e, which maximizes speed & causes nodes to have an |
|
108 |
average of 1.71 forward pointers. (A tree node would have to have |
|
109 |
2 pointers, one for left and one for right, so a skip-list is more |
|
110 |
space-efficient than a tree structure also.) |
|
111 |
<p>One big advantage of skip-lists over tree structures, given the way |
|
112 |
the denoiser uses them, is that iterator forward/backward operations |
|
113 |
are small-constant complexity; they're implemented by a single pointer |
|
114 |
dereference. The typical tree-structure iterator forward/backward is |
|
115 |
logarithmic. Iterator forward/backward is used constantly throughout |
|
116 |
the denoiser. |
|
117 |
<p><tt>Set</tt> is much like STL's <tt>set</tt> class, except that it's |
|
118 |
based on <tt>SkipList</tt>. |
|
119 |
||
120 |
<a name="ImplementationRegion2D"><h2>Region2D, SetRegion2D, |
|
121 |
BitmapRegion2D</h2></a> |
|
122 |
<p><tt>SetRegion2D</tt> was the first region class written for the |
|
123 |
denoiser; later, <tt>Region2D</tt> and <tt>SetRegion2D</tt> were split |
|
124 |
into two classes, and <tt>BitmapRegion2D</tt> was made a subclass of |
|
125 |
<tt>Region2D</tt>. It was not a perfect separation, and <tt>Region2D</tt> |
|
126 |
remains a sketch of what I'd like to see, rather than a completed |
|
127 |
product. I could solve its problem using virtual methods, but that |
|
128 |
would prevent a lot of function-inlining from happening, and for |
|
129 |
performance reasons I don't want to do that. |
|
130 |
<p><tt>SetRegion2D</tt> uses <tt>Set</tt> to implement regions as a set of |
|
131 |
horizontal extents, i.e. as y/x-start/x-end triplets. Quite a bit of |
|
132 |
work went into writing efficient union/subtraction methods. |
|
133 |
<p><tt>BitmapRegion2D</tt> uses an array of integers, treated like a bit |
|
134 |
field, to implement regions. It's faster to use them in some cases, |
|
135 |
though they're a lot less memory-efficient than <tt>SetRegion2D</tt>, |
|
136 |
and have to be created with a maximum size. |
|
137 |
||
138 |
<a name="ImplementationReferenceFrame"><h2>Pixel, ReferencePixel, |
|
139 |
ReferenceFrame</h2></a> |
|
140 |
<p>The <tt>Pixel</tt> class is templated with a numeric type for storing |
|
141 |
pixel values, the number of dimensions in the pixel's value, and |
|
142 |
a numeric type to use when doing tolerance calculations. The rule of |
|
143 |
thumb is, the tolerance type should be able to hold the value of two |
|
144 |
pixel-value types multiplied together. |
|
145 |
<p>For YUV420P video, a Y pixel is |
|
146 |
<tt>Pixel<uint_8,1,int32_t></tt> and a color (i.e. CbCr) pixel |
|
147 |
is <tt>Pixel<uint8_t,2,int32_t></tt>. |
|
148 |
<p>The <tt>Pixel</tt> class contains methods to get and set the value of |
|
149 |
pixels, and to compare two pixels within a given tolerance. It also |
|
150 |
contains methods to generate tolerance values from integers, in case |
|
151 |
the pixel type has special rules. (For instance, the tolerance value |
|
152 |
for a color pixel is the square of its integer counterpart, since CbCr |
|
153 |
color is 2-dimensional.) |
|
154 |
<p>The <tt>ReferencePixel</tt> is templated much like the <tt>Pixel</tt> |
|
155 |
class. It holds a sum of pixel values, and the number of pixel values |
|
156 |
summed so far. It also counts the number of reference-frames that |
|
157 |
point to it. It's intended to represent a single pixel that's found |
|
158 |
to exist over several frames, and to produce an average value for the |
|
159 |
pixel, so as to smooth away errors. |
|
160 |
<p>The <tt>ReferenceFrame</tt> is a rectangular array of |
|
161 |
<tt>ReferencePixel</tt>s, representing each frame and the parts of the |
|
162 |
image that it has in common with other frames. |
|
163 |
||
164 |
<a name="ImplementationMotionSearcher"><h2>MotionSearcher, |
|
165 |
SearchWindow, SearchBorder</h2></a> |
|
166 |
<p>OK, so much for the easy part. |
|
167 |
<p><tt>MotionSearcher</tt> maintains a circular list of |
|
168 |
<tt>ReferenceFrame</tt>s. To make space for a new frame, the oldest |
|
169 |
frame is returned to the client. |
|
170 |
<p><tt>AddFrame()</tt> is responsible for processing new frames. It does |
|
171 |
so in several stages. First, it looks for pixels that haven't moved, |
|
172 |
i.e. new pixels whose corresponding reference-pixels are within the |
|
173 |
error tolerance. That resolves most of the average frame. |
|
174 |
<p>Next, it detects moved areas, i.e. parts of the new frame that match |
|
175 |
parts of the previous frame except that they've moved. |
|
176 |
<p>It could iterate through the frame in any order, but to keep the |
|
177 |
implementation of <tt>SearchWindow</tt> and <tt>SearchBorder</tt> simple |
|
178 |
and efficient, it iterates through the frame in a zigzag pattern, |
|
179 |
i.e. starting at the upper-left corner, it moves right to the edge, |
|
180 |
then down a line, then left to the edge, then down a line, and so on |
|
181 |
to the end of the frame. |
|
182 |
<p>The search-window consists of the reference-frame's pixels, |
|
183 |
partitioned into search-window cells. Each cell contains a |
|
184 |
pixel-group (i.e. a rectangular array of pixels, containing the |
|
185 |
minimum pixel pattern that can be searched for). The pixel-groups |
|
186 |
in each cell overlap other cells; although the motion-search is done |
|
187 |
in terms of pixel-groups, it still looks at all all possible |
|
188 |
combinations of pixels that could form a pixel-group. |
|
189 |
<p>The pixel-sorter is a tree structure that partitions pixel-groups |
|
190 |
(actually, search-window cells, which contain a pixel-group). |
|
191 |
The total number of dimensions of a pixel-group is the number of |
|
192 |
pixels in the group, times the dimension of a pixel. For the |
|
193 |
YUV420 implementation, 4x2 pixel-groups are used for intensity pixels, |
|
194 |
which consist of 1 dimension, for a total of 8, and 2x2 pixel groups |
|
195 |
are used for color pixels, which consist of 2 dimensions, again for a |
|
196 |
total of 8. Partitioning n dimensions requires 2<sup>n</sup> |
|
197 |
branches per tree node; in this example, that's 256. (So if the |
|
198 |
pixel-sorter tree is well-balanced, then descending to a child branch |
|
199 |
cuts out all but <sup>1</sup>/256 of the remaining pixel-groups, which |
|
200 |
is supposed to make searching for matching pixel-groups very efficient.) |
|
201 |
Search-window cells are inserted into the pixel-sorter tree, and |
|
202 |
descend into child branches based on their value. But if any of the |
|
203 |
pixel-group's pixel dimension values are within the error tolerance of |
|
204 |
the split-point for that dimension in the current pixel-sorter branch |
|
205 |
node, then that pixel-group won't fit neatly into any of the children |
|
206 |
nodes, and thus the search-window cell has to stay at that level of |
|
207 |
the tree. (Alternately, a copy of it could be placed in multiple |
|
208 |
children, but this design opts not to do that.) Each pixel-sorter |
|
209 |
branch node maintains a doubly-linked list of search-window cells that |
|
210 |
are attached to it. As an optimization, once a search-window cell is |
|
211 |
inserted into the pixel-sorter, that result is used for the rest of |
|
212 |
the frame, as the search-window cell is added to and removed from the |
|
213 |
pixel-sorter, depending on whether that search-window cell is within |
|
214 |
the search radius of the current new-frame pixel-group. |
|
215 |
<p>To look for matches between the current pixel-group from the new |
|
216 |
frame, and all pixel-groups from the previous frame within the search |
|
217 |
radius, one just matches the current pixel-group to every |
|
218 |
search-window cell attached to the pixel-sorter branch nodes, and |
|
219 |
descend the tree according to the new pixel-group's values. (One |
|
220 |
optimization is possible here: If the search-window cell was forced |
|
221 |
to stop at that level of the pixel-sorter because one of its pixel |
|
222 |
values was within the tolerance of the split value of that |
|
223 |
pixel-sorter branch node, and none of the current pixel-group's |
|
224 |
pixel values are within twice the tolerance of the split value |
|
225 |
of that pixel-sorter branch node, then we can save time and avoid the |
|
226 |
comparison, for no search-window cell that had to stop at that level |
|
227 |
could possibly intersect the new pixel-group. This especially helps |
|
228 |
in the presence of low error thresholds.) |
|
229 |
<p>As matches are found, the search-border builds contiguous regions of |
|
230 |
matches that all have the same motion-vector. (The "border" is the |
|
231 |
border between the searched area and the not-yet-searched area.) |
|
232 |
It's designed to move through the regions of matches in a zigzag |
|
233 |
pattern, and constantly maintain a list of all regions that would |
|
234 |
be contiguous with the current new-frame pixel-group. When a match |
|
235 |
is found, all such regions with the same motion-vector are now |
|
236 |
contiguous, once the current pixel-group's area is added. |
|
237 |
<p>The search-border is implemented by sets of startpoints/endpoints. |
|
238 |
Every horizontal extent (that could potentially intersect a |
|
239 |
new-frame pixel-group) of every region under construction along the |
|
240 |
border is represented in the set of startpoints/endpoints. The |
|
241 |
search-border also has two arrays of set iterators, one for |
|
242 |
startpoints, one for endpoints. As the search zig-zags across the |
|
243 |
new frame, these two arrays of iterators keep track of all regions |
|
244 |
that will be contiguous with the current new-frame pixel-group, and |
|
245 |
all regions that are no longer contiguous with the current new-frame |
|
246 |
pixel-group; by doing this, it's very efficient to maintain the set |
|
247 |
of border regions that would be contiguous with the current new-frame |
|
248 |
pixel-group. |
|
249 |
<p>The general idea is to analyze the entire frame like this, then run |
|
250 |
through the found regions from largest to smallest, and apply them to |
|
251 |
the new frame. This can be a lot of data, too much in fact. To |
|
252 |
limit the search to a reasonable complexity, two throttles exist -- |
|
253 |
one on the number of matches in the area of the current pixel-group, |
|
254 |
and one on the size of the the largest match in the area of the |
|
255 |
current pixel-group. If there are too many regions in the area, |
|
256 |
or if the biggest region in the area is too large, then the best |
|
257 |
region found so far is chosen, all other regions in the area are |
|
258 |
thrown away, and that best region is applied to the new frame right |
|
259 |
then. This will eliminate pixel-groups from consideration in the |
|
260 |
search-window and pixel-sorter, which will save time in the search. |
|
261 |
This will also resolve new-frame pixels; only pixel-groups that |
|
262 |
contain nothing but unresolved pixels can be searched for in the |
|
263 |
pixel-sorter, which also saves time in the remainder of the search. |
|
264 |
Only after the entire frame is analyzed are regions applied from |
|
265 |
largest to smallest. |
|
266 |
<p>Before a match is applied to the new frame, it's flood-filled in |
|
267 |
order to resolve its entire extent. Searching is done in terms of |
|
268 |
pixel-groups, so it won't be able to find any detail that's smaller |
|
269 |
than a pixel-group. Also, the region may not have been completed, if |
|
270 |
it was chosen because a throttle value was exceeded, so its full |
|
271 |
extent is not known. Plus, parts of that match may have already |
|
272 |
been resolved. The flood-fill takes care of all of these situations. |
|
273 |
<p>Any pixels not found by the above searches are declared to be new |
|
274 |
information, and new reference-pixels are allocated for all such |
|
275 |
new-frame pixels. |
|
276 |
<p>The whole point of this design is to use as many high-level-design |
|
277 |
features as possible to reduce the amount of work necessary to perform |
|
278 |
the job. It attempts to accomplish this with a heavy reliance on |
|
279 |
data structures over more mathematical algorithms, a drive to |
|
280 |
locate sub-linear/sub-quadratic algorithms for common tasks (e.g. |
|
281 |
the pixel-sorter tree, which reduced quadratic to logarithmic, and |
|
282 |
the iterator arrays in the search-border, which reduced logarithmic |
|
283 |
to small-constant), and to use data structure design to understand |
|
284 |
the problem in ways that directly lead to efficient implementations. |
|
285 |
||
286 |
<a name="ImplementationMotionSearcherVersion1"><h3>Version 1</h3></a> |
|
287 |
<p>The above discussion describes the intent of the first version of |
|
288 |
the denoiser (which I'm calling version 0). However, the very last |
|
289 |
high-level change I made to it, right before releasing it under GPL, |
|
290 |
was defective! In effect, the match-count throttle was always 1, and |
|
291 |
the best match was applied every pixel-group! It was intended |
|
292 |
to be a performance increase, and it was, but obviously it broke the |
|
293 |
code...except that this bugged change also vastly increased the |
|
294 |
quality! There is every reason in the world to believe that such a |
|
295 |
bug should have broken the denoiser's quality, except that it didn't. |
|
296 |
What a stroke of luck! |
|
297 |
<p>I decided to take the implications of this accidental discovery to |
|
298 |
the next logical level. Before, I wouldn't have believed that the |
|
299 |
best match could be found without amassing a large list of |
|
300 |
possibilities from several pixel-group searches. Now, each match |
|
301 |
found is flood-filled, and the first one to exceed the match-size |
|
302 |
throttle is applied to the image right then and there, and all other |
|
303 |
possibilities aren't considered. So there is no big list of regions |
|
304 |
to try at the end. |
|
305 |
<p>This version performed better than the bugged version, which is |
|
306 |
surprising enough, but the quality was vastly improved too. |
|
307 |
<p>Parallel processing was added at this time too. Color can be |
|
308 |
denoised in a thread separate from intensity denoising, and |
|
309 |
reading/writing frames can be moved into separate threads too. |
|
310 |
||
311 |
<!--
|
|
312 |
<a name="ImplementationXXX"><h2>XXX</h2></a>
|
|
313 |
<p>
|
|
314 |
-->
|
|
315 |
||
316 |
<a name="FutureExtensions"><h1>Future Extensions</h1></a> |
|
317 |
<p>The motion-detector is highly parallel, and a better multi-threaded |
|
318 |
version should be written. So far, color and intensity can be |
|
319 |
analyzed separately. But a thread could conceivably denoise one half of the |
|
320 |
remaining area of an image plane. The pixel-sorter would have to |
|
321 |
become its own class for this to work, since there'd now be more than |
|
322 |
one per search-window, and the search-window would have to keep |
|
323 |
pixel-sorters from colliding with each other (i.e. the areas being |
|
324 |
searched by each can't overlap). Also, depending on how efficient |
|
325 |
the thread-synchronization methods are, the pixel-sorter could feed |
|
326 |
results to the flood-filling stage. Perhaps both approaches can be |
|
327 |
taken. Anything that allows the denoiser to fill up processors is |
|
328 |
probably worth trying. |
|
329 |
<p>The motion-detector should probably be able to find matches in |
|
330 |
more than one previous frame. I'd probably want to avoid later-frame |
|
331 |
matches that occur in earlier frames, for efficiency's sake. |
|
332 |
<p>The search-border should allow the insertion of new regions. It |
|
333 |
would need a method, implemented somewhat like <tt>AddNewMatch()</tt>, |
|
334 |
to generate a startpoint/endpoint for every horizontal extent in the |
|
335 |
region that intersected the border area. This could be used to make |
|
336 |
a <tt>ChooseBestActiveRegion()</tt> that eliminated flood-filled areas |
|
337 |
from the other regions on the border and then put those regions back |
|
338 |
into the border. I don't know if this would speed things up, though. |
|
339 |
<p>I think the search-border can be used to implement removal of |
|
340 |
transient phenomena, e.g. film lint/scratches, LaserDisc rot, |
|
341 |
analog-broadcast static, etc. After the new frame has had motion |
|
342 |
detection run on it, look at the 2nd-to-oldest frame, and build |
|
343 |
contiguous regions of pixels that have the same reference count. |
|
344 |
(Film lint/scratches will be limited to 1 reference count; I think |
|
345 |
LaserDisc rot can extend over several frames.) If a region has |
|
346 |
the right shape and the expected contents for a known sort of |
|
347 |
artifact, it's declared to be an instance of that artifact, and a |
|
348 |
blend involving the next/previous frame is used to generate the values |
|
349 |
of pixels affected by the artifact. We'll probably want to take the |
|
350 |
result of motion-detection into account too. |
|
351 |
<p>I think the search-window could replace the radius-search in |
|
352 |
yuvmedianfilter. It could even do motion-detection for mpeg2enc. |
|
353 |
||
354 |
<p>Copyright (C) 2004 by |
|
355 |
<a href="mailto:ulatec@users.sourceforge.net">Steven Boswell</a>. |
|
356 |
All rights reserved, and all that stuff. |
|
357 |
<br>Released to the public under the GNU General Public License. |
|
358 |
See the file COPYING for more information. |
|
359 |
</body> |
|
360 |
</html> |