176
246
quantiztation table
249
Highlevel bitstream structure:
250
=============================
251
--------------------------------------------
253
--------------------------------------------
254
| ------------------------------------ |
258
| | ......... intra? | |
259
| | : Block01 : yes no | |
260
| | : Block02 : ....... .......... | |
261
| | : Block03 : : y DC : : ref index: | |
262
| | : Block04 : : cb DC : : motion x : | |
263
| | ......... : cr DC : : motion y : | |
264
| | ....... .......... | |
265
| ------------------------------------ |
266
| ------------------------------------ |
269
--------------------------------------------
270
| ------------ ------------ ------------ |
271
|| Y subbands | | Cb subbands| | Cr subbands||
272
|| --- --- | | --- --- | | --- --- ||
273
|| |LL0||HL0| | | |LL0||HL0| | | |LL0||HL0| ||
274
|| --- --- | | --- --- | | --- --- ||
275
|| --- --- | | --- --- | | --- --- ||
276
|| |LH0||HH0| | | |LH0||HH0| | | |LH0||HH0| ||
277
|| --- --- | | --- --- | | --- --- ||
278
|| --- --- | | --- --- | | --- --- ||
279
|| |HL1||LH1| | | |HL1||LH1| | | |HL1||LH1| ||
280
|| --- --- | | --- --- | | --- --- ||
281
|| --- --- | | --- --- | | --- --- ||
282
|| |HH1||HL2| | | |HH1||HL2| | | |HH1||HL2| ||
283
|| ... | | ... | | ... ||
284
| ------------ ------------ ------------ |
285
--------------------------------------------
296
| | LL0 subband prediction
299
------------------- \ |
300
| Reference frames | \ IDWT
301
| ------- ------- | Motion \ |
302
||Frame 0| |Frame 1|| Compensation . OBMC v -------
303
| ------- ------- | --------------. \------> + --->|Frame n|-->output
304
| ------- ------- | -------
305
||Frame 2| |Frame 3||<----------------------------------/
315
The implemented range coder is an adapted version based upon "Range encoding:
316
an algorithm for removing redundancy from a digitised message." by G. N. N.
318
The symbols encoded by the Snow range coder are bits (0|1). The
319
associated probabilities are not fix but change depending on the symbol mix
324
---------+-----------------------------------------------
325
0 | 256 - state_transition_table[256 - old_state];
326
1 | state_transition_table[ old_state];
328
state_transition_table = {
329
0, 0, 0, 0, 0, 0, 0, 0, 20, 21, 22, 23, 24, 25, 26, 27,
330
28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 37, 38, 39, 40, 41, 42,
331
43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 56, 57,
332
58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73,
333
74, 75, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88,
334
89, 90, 91, 92, 93, 94, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103,
335
104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 114, 115, 116, 117, 118,
336
119, 120, 121, 122, 123, 124, 125, 126, 127, 128, 129, 130, 131, 132, 133, 133,
337
134, 135, 136, 137, 138, 139, 140, 141, 142, 143, 144, 145, 146, 147, 148, 149,
338
150, 151, 152, 152, 153, 154, 155, 156, 157, 158, 159, 160, 161, 162, 163, 164,
339
165, 166, 167, 168, 169, 170, 171, 171, 172, 173, 174, 175, 176, 177, 178, 179,
340
180, 181, 182, 183, 184, 185, 186, 187, 188, 189, 190, 190, 191, 192, 194, 194,
341
195, 196, 197, 198, 199, 200, 201, 202, 202, 204, 205, 206, 207, 208, 209, 209,
342
210, 211, 212, 213, 215, 215, 216, 217, 218, 219, 220, 220, 222, 223, 224, 225,
343
226, 227, 227, 229, 229, 230, 231, 232, 234, 234, 235, 236, 237, 238, 239, 240,
344
241, 242, 243, 244, 245, 246, 247, 248, 248, 0, 0, 0, 0, 0, 0, 0};
349
Range Coding of integers:
350
-------------------------
182
354
Neighboring Blocks:
183
355
===================
184
356
left and top are set to the respective blocks unless they are outside of
185
357
the image in which case they are set to the Null block
187
top-left is set to the top left block unless its outside of the image in
359
top-left is set to the top left block unless it is outside of the image in
188
360
which case it is set to the left block
190
if this block has no larger parent block or its at the left side of its
362
if this block has no larger parent block or it is at the left side of its
191
363
parent block and the top right block is not outside of the image then the
192
364
top right block is used for top-right else the top-left block is used
215
387
the luma and chroma values of the left block are used as predictors
217
389
the used luma and chroma is the sum of the predictor and y_diff, cb_diff, cr_diff
390
to reverse this in the decoder apply the following:
391
block[y][x].dc[0] = block[y][x-1].dc[0] + y_diff;
392
block[y][x].dc[1] = block[y][x-1].dc[1] + cb_diff;
393
block[y][x].dc[2] = block[y][x-1].dc[2] + cr_diff;
394
block[*][-1].dc[*]= 128;
220
397
Motion Compensation:
221
398
====================
400
Halfpel interpolation:
401
----------------------
402
halfpel interpolation is done by convolution with the halfpel filter stored
405
horizontal halfpel samples are found by
406
H1[y][x] = hcoeff[0]*(F[y][x ] + F[y][x+1])
407
+ hcoeff[1]*(F[y][x-1] + F[y][x+2])
408
+ hcoeff[2]*(F[y][x-2] + F[y][x+3])
410
h1[y][x] = (H1[y][x] + 32)>>6;
412
vertical halfpel samples are found by
413
H2[y][x] = hcoeff[0]*(F[y ][x] + F[y+1][x])
414
+ hcoeff[1]*(F[y-1][x] + F[y+2][x])
416
h2[y][x] = (H2[y][x] + 32)>>6;
418
vertical+horizontal halfpel samples are found by
419
H3[y][x] = hcoeff[0]*(H2[y][x ] + H2[y][x+1])
420
+ hcoeff[1]*(H2[y][x-1] + H2[y][x+2])
422
H3[y][x] = hcoeff[0]*(H1[y ][x] + H1[y+1][x])
423
+ hcoeff[1]*(H1[y+1][x] + H1[y+2][x])
425
h3[y][x] = (H3[y][x] + 2048)>>12;
436
F-------F-------F-> H1<-F-------F-------F
440
F-------F-------F-> H1<-F-------F-------F
451
unavailable fullpel samples (outside the picture for example) shall be equal
452
to the closest available fullpel sample
455
Smaller pel interpolation:
456
--------------------------
457
if diag_mc is set then points which lie on a line between 2 vertically,
458
horiziontally or diagonally adjacent halfpel points shall be interpolated
459
linearls with rounding to nearest and halfway values rounded up.
460
points which lie on 2 diagonals at the same time should only use the one
461
diagonal not containing the fullpel point
465
F-->O---q---O<--h1->O---q---O<--F
473
h2-->O---q---O<--h3->O---q---O<--h2
481
F-->O---q---O<--h1->O---q---O<--F
485
the remaining points shall be bilinearly interpolated from the
486
up to 4 surrounding halfpel and fullpel points, again rounding should be to
487
nearest and halfway values rounded up
489
compliant Snow decoders MUST support 1-1/8 pel luma and 1/2-1/16 pel chroma
490
interpolation at least
493
Overlapped block motion compensation:
494
-------------------------------------
224
497
LL band prediction:
225
498
===================
499
Each sample in the LL0 subband is predicted by the median of the left, top and
500
left+top-topleft samples, samples outside the subband shall be considered to
501
be 0. To reverse this prediction in the decoder apply the following.
502
for(y=0; y<height; y++){
503
for(x=0; x<width; x++){
504
sample[y][x] += median(sample[y-1][x],
506
sample[y-1][x]+sample[y][x-1]-sample[y-1][x-1]);
509
sample[-1][*]=sample[*][-1]= 0;
510
width,height here are the width and height of the LL0 subband not of the final
232
518
Wavelet Transform:
233
519
==================
521
Snow supports 2 wavelet transforms, the symmetric biorthogonal 5/3 integer
522
transform and a integer approximation of the symmetric biorthogonal 9/7
525
2D IDWT (inverse discrete wavelet transform)
526
--------------------------------------------
527
The 2D IDWT applies a 2D filter recursively, each time combining the
528
4 lowest frequency subbands into a single subband until only 1 subband
530
The 2D filter is done by first applying a 1D filter in the vertical direction
531
and then applying it in the horizontal one.
532
--------------- --------------- --------------- ---------------
533
|LL0|HL0| | | | | | | | | | | |
534
|---+---| HL1 | | L0|H0 | HL1 | | LL1 | HL1 | | | |
535
|LH0|HH0| | | | | | | | | | | |
536
|-------+-------|->|-------+-------|->|-------+-------|->| L1 | H1 |->...
537
| | | | | | | | | | | |
538
| LH1 | HH1 | | LH1 | HH1 | | LH1 | HH1 | | | |
539
| | | | | | | | | | | |
540
--------------- --------------- --------------- ---------------
545
1. interleave the samples of the low and high frequency subbands like
546
s={L0, H0, L1, H1, L2, H2, L3, H3, ... }
547
note, this can end with a L or a H, the number of elements shall be w
548
s[-1] shall be considered equivalent to s[1 ]
549
s[w ] shall be considered equivalent to s[w-2]
551
2. perform the lifting steps in order as described below
554
1. s[i] -= (s[i-1] + s[i+1] + 2)>>2; for all even i < w
555
2. s[i] += (s[i-1] + s[i+1] )>>1; for all odd i < w
557
\ | /|\ | /|\ | /|\ | /|\
558
\|/ | \|/ | \|/ | \|/ |
560
/|\ | /|\ | /|\ | /|\ |
561
/ | \|/ | \|/ | \|/ | \|/
565
Snow's 9/7 Integer filter:
566
1. s[i] -= (3*(s[i-1] + s[i+1]) + 4)>>3; for all even i < w
567
2. s[i] -= s[i-1] + s[i+1] ; for all odd i < w
568
3. s[i] += ( s[i-1] + s[i+1] + 4*s[i] + 8)>>4; for all even i < w
569
4. s[i] += (3*(s[i-1] + s[i+1]) )>>1; for all odd i < w
571
\ | /|\ | /|\ | /|\ | /|\
572
\|/ | \|/ | \|/ | \|/ |
574
/|\ | /|\ | /|\ | /|\ |
575
/ | \|/ | \|/ | \|/ | \|/
576
(| + (| + (| + (| + -1
577
\ + /|\ + /|\ + /|\ + /|\ +1/4
578
\|/ | \|/ | \|/ | \|/ |
579
+ | + | + | + | +1/16
580
/|\ | /|\ | /|\ | /|\ |
581
/ | \|/ | \|/ | \|/ | \|/
585
following are exactly identical
586
(3a)>>1 == a + (a>>1)
587
(a + 4b + 8)>>4 == ((a>>2) + b + 2)>>2
589
16bit implementation note:
590
The IDWT can be implemented with 16bits, but this requires some care to
591
prevent overflows, the following list, lists the minimum number of bits needed
594
A= s[i-1] + s[i+1] 16bit
599
s[i-1] + s[i+1] 17bit
602
3*(s[i-1] + s[i+1]) 17bit
239
608
finetune initial contexts
240
spatial_decomposition_count per frame?
242
610
try to use the wavelet transformed predicted image (motion compensated image) as context for coding the residual coefficients
243
611
try the MV length as context for coding the residual coefficients
244
612
use extradata for stuff which is in the keyframes now?
245
613
the MV median predictor is patented IIRC
614
implement per picture halfpel interpolation
615
try different range coder state transition tables for different contexts
618
compare the 6 tap and 8 tap hpel filters (psnr/bitrate and subjective quality)
248
619
spatial_scalability b vs u (!= 0 breaks syntax anyway so we can add a u later)