[FFmpeg-devel] avcodec/aaccoder: Limit sf_idx difference for all cases

Submitted by Claudio Freire on Sept. 22, 2016, 9:51 a.m.

Details

Message ID CAGTBQpY2dJq_xGZuQ3wk0fZiEdb-Fj=pZL_v_23_GvV+Cf4MWA@mail.gmail.com
State New
Headers show

Commit Message

Claudio Freire Sept. 22, 2016, 9:51 a.m.
On Sat, Sep 10, 2016 at 3:37 AM, Claudio Freire <klaussfreire@gmail.com> wrote:
> On Thu, Aug 25, 2016 at 8:57 AM, Rostislav Pehlivanov
> <atomnuker@gmail.com> wrote:
>>> 64ed96a710787ba5d0666746a8562e7d.dee
>>>
>>> Found-by: Mateusz "j00ru" Jurczyk and Gynvael Coldwind
>>> Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
>>> ---
>>>  libavcodec/aaccoder.c | 8 +++++++-
>>>  1 file changed, 7 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/libavcodec/aaccoder.c b/libavcodec/aaccoder.c
>>> index 284b401..995724b 100644
>>> --- a/libavcodec/aaccoder.c
>>> +++ b/libavcodec/aaccoder.c
>>> @@ -196,7 +196,7 @@ typedef struct TrellisPath {
>>>  static void set_special_band_scalefactors(AACEncContext *s,
>>> SingleChannelElement *sce)
>>>  {
>>>      int w, g;
>>> -    int prevscaler_n = -255, prevscaler_i = 0;
>>> +    int prevscaler_n = -255, prevscaler_i = 0, prevscaler_d = -255;
>>>      int bands = 0;
>>>
>>>      for (w = 0; w < sce->ics.num_windows; w += sce->ics.group_len[w]) {
>>> @@ -211,6 +211,10 @@ static void set_special_band_scalefactors(AACEncContext
>>> *s, SingleChannelElement
>>>                  if (prevscaler_n == -255)
>>>                      prevscaler_n = sce->sf_idx[w*16+g];
>>>                  bands++;
>>> +            } else {
>>> +                if (prevscaler_d == -255)
>>> +                    prevscaler_d = sce->sf_idx[w*16+g];
>>> +                bands++;
>>>              }
>>>          }
>>>      }
>>> @@ -227,6 +231,8 @@ static void set_special_band_scalefactors(AACEncContext
>>> *s, SingleChannelElement
>>>                  sce->sf_idx[w*16+g] = prevscaler_i =
>>> av_clip(sce->sf_idx[w*16+g], prevscaler_i - SCALE_MAX_DIFF, prevscaler_i +
>>> SCALE_MAX_DIFF);
>>>              } else if (sce->band_type[w*16+g] == NOISE_BT) {
>>>                  sce->sf_idx[w*16+g] = prevscaler_n =
>>> av_clip(sce->sf_idx[w*16+g], prevscaler_n - SCALE_MAX_DIFF, prevscaler_n +
>>> SCALE_MAX_DIFF);
>>> +            } else {
>>> +                sce->sf_idx[w*16+g] = prevscaler_d =
>>> av_clip(sce->sf_idx[w*16+g], prevscaler_d - SCALE_MAX_DIFF, prevscaler_d +
>>> SCALE_MAX_DIFF);
>>>              }
>>>          }
>>>      }
>>> --
>>> 2.9.3
>>>
>>> _______________________________________________
>>> ffmpeg-devel mailing list
>>> ffmpeg-devel@ffmpeg.org
>>> http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
>>>
>>
>>
>> That fuzzed sample seems to be causing the algorithm which does SF
>> difference normalization between normal and PNS bands to fail. This commit
>> masks the problem downstream. IMO that's not the correct way to solve this,
>> as there's no guarantee that another sample won't trigger the same assert
>> even when limiting all scalefactors. Fixing a single fuzzed sample with a
>> hack which doesn't stop other fuzzed samples from triggering the same bug
>> isn't justified.
>> I have the time right now and I'll try to fix this properly, but it might
>> take me a day or two. I think the problem is that when the twoloop coder
>> does the the normalization it doesn't take into account the fact that IS
>> and PNS have their scalefactors modified by set_special_band_scalefactors()
>> later on before encoding.
>
> It seems the root of the issue is that the two stages of PNS don't
> agree on when they can apply PNS or not.
>
> I have a WIP that eliminates the issue by just making the two agree,
> but I've got unrelated changes so I'll try to distill the patch to the
> minimum necessary to fix this during the weekend.

Sorry for the delay, it turned out to be more complex than that.

There were a few potential violations that I had already identified in
a WIP patch but they did not apply to the fuzzed sample. That sample
triggered an interaction with TNS and trellis band type coding that
resulted in zeroed bands reappearing and thus invalidating all delta
scalefactor validations.

The attached patch series fixes most of the delta scalefactor
violation risks I could find, including that one.

It hasn't been thoroughly tested for quality regressions/improvements.
It's possible that it does change quality since it changes key
decision points that conduce to the violations but also to lots of
audible artifacts. So I believe it should improve quality, but one
never knows without proper ABX testing, which I'll be conducting, at
least in a limited way, in the following days.

In the meantime, I'm attaching the patch series for review.

Comments

Michael Niedermayer Nov. 16, 2016, 11:09 p.m.
On Thu, Sep 22, 2016 at 06:51:03AM -0300, Claudio Freire wrote:
> On Sat, Sep 10, 2016 at 3:37 AM, Claudio Freire <klaussfreire@gmail.com> wrote:
> > On Thu, Aug 25, 2016 at 8:57 AM, Rostislav Pehlivanov
> > <atomnuker@gmail.com> wrote:
> >>> 64ed96a710787ba5d0666746a8562e7d.dee
> >>>
> >>> Found-by: Mateusz "j00ru" Jurczyk and Gynvael Coldwind
> >>> Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
> >>> ---
> >>>  libavcodec/aaccoder.c | 8 +++++++-
> >>>  1 file changed, 7 insertions(+), 1 deletion(-)
> >>>
> >>> diff --git a/libavcodec/aaccoder.c b/libavcodec/aaccoder.c
> >>> index 284b401..995724b 100644
> >>> --- a/libavcodec/aaccoder.c
> >>> +++ b/libavcodec/aaccoder.c
> >>> @@ -196,7 +196,7 @@ typedef struct TrellisPath {
> >>>  static void set_special_band_scalefactors(AACEncContext *s,
> >>> SingleChannelElement *sce)
> >>>  {
> >>>      int w, g;
> >>> -    int prevscaler_n = -255, prevscaler_i = 0;
> >>> +    int prevscaler_n = -255, prevscaler_i = 0, prevscaler_d = -255;
> >>>      int bands = 0;
> >>>
> >>>      for (w = 0; w < sce->ics.num_windows; w += sce->ics.group_len[w]) {
> >>> @@ -211,6 +211,10 @@ static void set_special_band_scalefactors(AACEncContext
> >>> *s, SingleChannelElement
> >>>                  if (prevscaler_n == -255)
> >>>                      prevscaler_n = sce->sf_idx[w*16+g];
> >>>                  bands++;
> >>> +            } else {
> >>> +                if (prevscaler_d == -255)
> >>> +                    prevscaler_d = sce->sf_idx[w*16+g];
> >>> +                bands++;
> >>>              }
> >>>          }
> >>>      }
> >>> @@ -227,6 +231,8 @@ static void set_special_band_scalefactors(AACEncContext
> >>> *s, SingleChannelElement
> >>>                  sce->sf_idx[w*16+g] = prevscaler_i =
> >>> av_clip(sce->sf_idx[w*16+g], prevscaler_i - SCALE_MAX_DIFF, prevscaler_i +
> >>> SCALE_MAX_DIFF);
> >>>              } else if (sce->band_type[w*16+g] == NOISE_BT) {
> >>>                  sce->sf_idx[w*16+g] = prevscaler_n =
> >>> av_clip(sce->sf_idx[w*16+g], prevscaler_n - SCALE_MAX_DIFF, prevscaler_n +
> >>> SCALE_MAX_DIFF);
> >>> +            } else {
> >>> +                sce->sf_idx[w*16+g] = prevscaler_d =
> >>> av_clip(sce->sf_idx[w*16+g], prevscaler_d - SCALE_MAX_DIFF, prevscaler_d +
> >>> SCALE_MAX_DIFF);
> >>>              }
> >>>          }
> >>>      }
> >>> --
> >>> 2.9.3
> >>>
> >>> _______________________________________________
> >>> ffmpeg-devel mailing list
> >>> ffmpeg-devel@ffmpeg.org
> >>> http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
> >>>
> >>
> >>
> >> That fuzzed sample seems to be causing the algorithm which does SF
> >> difference normalization between normal and PNS bands to fail. This commit
> >> masks the problem downstream. IMO that's not the correct way to solve this,
> >> as there's no guarantee that another sample won't trigger the same assert
> >> even when limiting all scalefactors. Fixing a single fuzzed sample with a
> >> hack which doesn't stop other fuzzed samples from triggering the same bug
> >> isn't justified.
> >> I have the time right now and I'll try to fix this properly, but it might
> >> take me a day or two. I think the problem is that when the twoloop coder
> >> does the the normalization it doesn't take into account the fact that IS
> >> and PNS have their scalefactors modified by set_special_band_scalefactors()
> >> later on before encoding.
> >
> > It seems the root of the issue is that the two stages of PNS don't
> > agree on when they can apply PNS or not.
> >
> > I have a WIP that eliminates the issue by just making the two agree,
> > but I've got unrelated changes so I'll try to distill the patch to the
> > minimum necessary to fix this during the weekend.
> 
> Sorry for the delay, it turned out to be more complex than that.
> 
> There were a few potential violations that I had already identified in
> a WIP patch but they did not apply to the fuzzed sample. That sample
> triggered an interaction with TNS and trellis band type coding that
> resulted in zeroed bands reappearing and thus invalidating all delta
> scalefactor validations.
> 
> The attached patch series fixes most of the delta scalefactor
> violation risks I could find, including that one.
> 
> It hasn't been thoroughly tested for quality regressions/improvements.
> It's possible that it does change quality since it changes key
> decision points that conduce to the violations but also to lots of
> audible artifacts. So I believe it should improve quality, but one
> never knows without proper ABX testing, which I'll be conducting, at
> least in a limited way, in the following days.
> 
> In the meantime, I'm attaching the patch series for review.

ping

[...]
Claudio Freire Nov. 16, 2016, 11:18 p.m.
On Wed, Nov 16, 2016 at 8:09 PM, Michael Niedermayer
<michael@niedermayer.cc> wrote:
>> Sorry for the delay, it turned out to be more complex than that.
>>
>> There were a few potential violations that I had already identified in
>> a WIP patch but they did not apply to the fuzzed sample. That sample
>> triggered an interaction with TNS and trellis band type coding that
>> resulted in zeroed bands reappearing and thus invalidating all delta
>> scalefactor validations.
>>
>> The attached patch series fixes most of the delta scalefactor
>> violation risks I could find, including that one.
>>
>> It hasn't been thoroughly tested for quality regressions/improvements.
>> It's possible that it does change quality since it changes key
>> decision points that conduce to the violations but also to lots of
>> audible artifacts. So I believe it should improve quality, but one
>> never knows without proper ABX testing, which I'll be conducting, at
>> least in a limited way, in the following days.
>>
>> In the meantime, I'm attaching the patch series for review.
>
> ping

Is that ping for me?

It's a significant patch so I'd rather have a little review before
pushing, waiting for that.

Well, not really waiting. I did some tests too and found some not so
desirable side effects that I hadn't fully fixed when RL decided to
rob me of most of my free time, so I haven't pushed it yet to prevent
even more serious regressions, but still I'd rather not push without
some form of review.

So, ccing Rostislav in case he just missed the original patch (he's
the most likely to review anyway)

Patch hide | download patch | download mbox

From 5a3fb7e7fcd156108cc05fb8221722544631c1b7 Mon Sep 17 00:00:00 2001
From: Claudio Freire <klaussfreire@gmail.com>
Date: Thu, 22 Sep 2016 06:37:54 -0300
Subject: [PATCH 4/4] AAC Encoder: prevent TNS from inducing sfdelta asserts

TNS filtering can change the required minimum coding book for bands
it applies to, so recompute band_type when TNS is being applied.

Similarly, zero flags must match for all windows in a window group
for code post-tns to work correctly, so make sure twoloop fills in
the zero flags appropriately.
---
 libavcodec/aaccoder.c         |  4 ++--
 libavcodec/aaccoder_twoloop.h | 14 +++++++++++++-
 libavcodec/aacenc_tns.c       | 21 ++++++++++++++++++++-
 3 files changed, 35 insertions(+), 4 deletions(-)

diff --git a/libavcodec/aaccoder.c b/libavcodec/aaccoder.c
index 1445ab4..a6edfa4 100644
--- a/libavcodec/aaccoder.c
+++ b/libavcodec/aaccoder.c
@@ -659,8 +659,8 @@  static void search_for_pns(AACEncContext *s, AVCodecContext *avctx, SingleChanne
                 dist1 += quantize_band_cost(s, &sce->coeffs[start_c],
                                             NOR34,
                                             sce->ics.swb_sizes[g],
-                                            sce->sf_idx[(w+w2)*16+g],
-                                            sce->band_alt[(w+w2)*16+g],
+                                            sce->sf_idx[w*16+g],
+                                            sce->band_alt[w*16+g],
                                             lambda/band->threshold, INFINITY, NULL, NULL, 0);
                 /* Estimate rd on average as 5 bits for SF, 4 for the CB, plus spread energy * lambda/thr */
                 dist2 += band->energy/(band->spread*band->spread)*lambda*dist_thresh/band->threshold;
diff --git a/libavcodec/aaccoder_twoloop.h b/libavcodec/aaccoder_twoloop.h
index 2098318..1bc1298 100644
--- a/libavcodec/aaccoder_twoloop.h
+++ b/libavcodec/aaccoder_twoloop.h
@@ -714,7 +714,7 @@  static void search_for_quantizers_twoloop(AVCodecContext *avctx,
                         prev = prevsf;
                     sce->sf_idx[w*16+g] = av_clip(sce->sf_idx[w*16+g], prev - SCALE_MAX_DIFF, prev + SCALE_MAX_DIFF);
                     sce->band_type[w*16+g] = find_min_book(maxvals[w*16+g], sce->sf_idx[w*16+g]);
-                    if (sce->band_type[w*16+g])
+                    if (sce->band_type[w*16+g] == 0)
                         sce->band_type[w*16+g] = 1;
                     prev = sce->sf_idx[w*16+g];
                     if (!fflag && prevsf != sce->sf_idx[w*16+g])
@@ -760,6 +760,18 @@  static void search_for_quantizers_twoloop(AVCodecContext *avctx,
             }
         }
     }
+
+    /** Broadcast zero flags on all windows in the group to keep things consistent */
+    for (w = 0; w < sce->ics.num_windows; w += sce->ics.group_len[w]) {
+        if (sce->ics.group_len[w] > 1) {
+            /** Make sure proper codebooks are set */
+            for (g = 0; g < sce->ics.num_swb; g++) {
+                int z = sce->zeroes[w*16+g];
+                for (w2 = 0; w2 < sce->ics.group_len[w]; w2++)
+                    sce->zeroes[(w+w2)*16+g] = z;
+            }
+        }
+    }
 }
 
 #endif /* AVCODEC_AACCODER_TWOLOOP_H */
diff --git a/libavcodec/aacenc_tns.c b/libavcodec/aacenc_tns.c
index 2ffe1f8..2ba1940 100644
--- a/libavcodec/aacenc_tns.c
+++ b/libavcodec/aacenc_tns.c
@@ -103,9 +103,10 @@  void ff_aac_apply_tns(AACEncContext *s, SingleChannelElement *sce)
 {
     TemporalNoiseShaping *tns = &sce->tns;
     IndividualChannelStream *ics = &sce->ics;
-    int w, filt, m, i, top, order, bottom, start, end, size, inc;
+    int w, filt, m, i, top, order, bottom, start, end, size, inc, g;
     const int mmm = FFMIN(ics->tns_max_bands, ics->max_sfb);
     float lpc[TNS_MAX_ORDER];
+    int has_filt = 0;
 
     for (w = 0; w < ics->num_windows; w++) {
         bottom = ics->num_swb;
@@ -137,6 +138,24 @@  void ff_aac_apply_tns(AACEncContext *s, SingleChannelElement *sce)
                     sce->coeffs[start] += lpc[i-1]*sce->pcoeffs[start - i*inc];
                 }
             }
+
+            has_filt = 1;
+        }
+    }
+
+    if (has_filt) {
+        abs_pow34_v(s->scoefs, sce->coeffs, 1024);
+        for (w = 0; w < sce->ics.num_windows; w += sce->ics.group_len[w]) {
+            start = w*128;
+            for (g = 0; g < sce->ics.num_swb; g++) {
+                if (!sce->zeroes[w*16+g]) {
+                    float maxval = find_max_val(sce->ics.group_len[w], sce->ics.swb_sizes[g], s->scoefs+start);
+                    int cb = find_min_book(maxval, sce->sf_idx[w*16+g]);
+                    if (sce->band_type[w*16+g] < cb)
+                        sce->band_type[w*16+g] = cb;
+                }
+                start += sce->ics.swb_sizes[g];
+            }
         }
     }
 }
-- 
1.8.4.5