diff mbox series

[FFmpeg-devel] avfilter/mpegaudiodec_template: Fix wrong decision of illegal

Message ID CAHaG5eH2KsiHLyOpzWetrO4kLDh7J3_tbhnSu3dgjxufnbwQbw@mail.gmail.com
State New
Headers show
Series [FFmpeg-devel] avfilter/mpegaudiodec_template: Fix wrong decision of illegal
Related show

Checks

Context Check Description
andriy/ffmpeg-patchwork pending
andriy/ffmpeg-patchwork fail Failed to apply patch

Commit Message

Ted Lee Jan. 8, 2020, 7:32 a.m. UTC
Dear FFmpeg developers,

I'm glad to have a chance to contribute to FFmpeg.

Since this is the first time for me, please give feedback if I missed
something. I will reflect on that.

*Summary:*

   - Test signal: 790_pomnyun_taebaek2.mp3
      - MP3 file with the same left channel and right channel except for
      the first 7 seconds and the last 10 seconds.
      (download:
      <https://www.dropbox.com/s/sivz18sixvhoo66/790_pomnyun_taebaek2.mp3?dl=0>
      https://www.dropbox.com/s/sivz18sixvhoo66/790_pomnyun_taebaek2.mp3?dl=0
      )
   - When decoding the test signal using FFmpeg as below, a difference
   occurs in the same part of L and R.
      - $ffmpeg -i 790_pomnyun_taebaek2.mp3 -acodec pcm_s16le
      790_pomnyun_taebaek2.wav


Expected result (Adobe Audition or Core Audio):

Decoded PCM
[image: image.png]
(L-R)
[image: image.png]
(You can reproduce it using Adobe Audition or CoreAudio of MacOS)

Actual result (FFmpeg):

Decoded PCM
[image: image.png]

(L-R)
[image: image.png]

*Why it happens?:*

   - In MPEG-1 Layer 3 (MP3), the illegal position of intensity stereo is
   defined as the maximum value of scale factor. Since intensity stereo
   positions (scale factor of the right channel) are always allocated to 3
   bits, intensity stereo positions are limited from 0 to 6 and 7 is treated
   as the illegal position.
   - In MPEG-2 LSF, the illegal position is also defined as the maximum
   value of scale factor, but the number of bits of the scale factor is not
   fixed. So the number of bits needs to be calculated (slen[]) and the
   illegal stereo position should be changed as the maximum value. However, in
   the current implementation, the illegal position is defined as fixed value
   16.

*How to fix this?:*

   - The maximum value of scale factor (the illegal position) should be
   store to sf_max_table[] and an illegal position is determined when the
   scale factor is sf_max_table[](sf_max).


From 754545f6c675aede07abd912992f9ed4a17d4ddc Mon Sep 17 00:00:00 2001
From: Ted <t@gaudiolab.com>
Date: Wed, 8 Jan 2020 11:53:56 +0900
Subject: [PATCH] avfilter/mpegaudiodec_template: Fix wrong decision of
illegal
 intensity stereo position for MPEG-2 LSF

Signed-off-by: Ted <t@gaudiolab.com>
---
 libavcodec/mpegaudiodec_template.c | 14 +++++++++++---
 1 file changed, 11 insertions(+), 3 deletions(-)

         tab0 = g0->sb_hybrid + 576;
@@ -1077,6 +1077,8 @@ static void compute_stereo(MPADecodeContext *s,
GranuleDef *g0, GranuleDef *g1)
                         }
                     }
                     sf = g1->scale_factors[k + l];
+                    if (s->lsf)
+                        sf_max = sf_max_table[k + 1];
                     if (sf >= sf_max)
                         goto found1;

@@ -1122,6 +1124,8 @@ found1:
                 /* for last band, use previous scale factor */
                 k  = (i == 21) ? 20 : i;
                 sf = g1->scale_factors[k];
+                if (s->lsf)
+                    sf_max = sf_max_table[k + 1];
                 if (sf >= sf_max)
                     goto found2;
                 v1 = is_tab[0][sf];
@@ -1523,11 +1527,15 @@ static int mp_decode_layer3(MPADecodeContext *s)
                     n  = lsf_nsf_table[tindex2][tindex][k];
                     sl = slen[k];
                     if (sl) {
-                        for (i = 0; i < n; i++)
+                        for (i = 0; i < n; i++) {
+                            sf_max_table[j] = (uint8_t) exp2(sl) - 1;
                             g->scale_factors[j++] = get_bits(&s->gb, sl);
+                        }
                     } else {
-                        for (i = 0; i < n; i++)
+                        for (i = 0; i < n; i++) {
+                            sf_max_table[j] = 0;
                             g->scale_factors[j++] = 0;
+                        }
                     }
                 }
                 /* XXX: should compute exact size */

Comments

Moritz Barsnick Jan. 8, 2020, 3:10 p.m. UTC | #1
On Wed, Jan 08, 2020 at 16:32:39 +0900, Ted Lee wrote:
> Dear FFmpeg developers,
> I'm glad to have a chance to contribute to FFmpeg.

Welcome.

> *Summary:*
>
>    - Test signal: 790_pomnyun_taebaek2.mp3
>       - MP3 file with the same left channel and right channel except for
>       the first 7 seconds and the last 10 seconds.
>       (download:
>       <https://www.dropbox.com/s/sivz18sixvhoo66/790_pomnyun_taebaek2.mp3?dl=0>
>       https://www.dropbox.com/s/sivz18sixvhoo66/790_pomnyun_taebaek2.mp3?dl=0
>       )
>    - When decoding the test signal using FFmpeg as below, a difference
>    occurs in the same part of L and R.
>       - $ffmpeg -i 790_pomnyun_taebaek2.mp3 -acodec pcm_s16le
>       790_pomnyun_taebaek2.wav
>
>
> Expected result (Adobe Audition or Core Audio):
>
> Decoded PCM
> [image: image.png]
> (L-R)
> [image: image.png]
> (You can reproduce it using Adobe Audition or CoreAudio of MacOS)
>
> Actual result (FFmpeg):
>
> Decoded PCM
> [image: image.png]
>
> (L-R)
> [image: image.png]

To put it in other words:
You expect the monoaural section of this file to be decoded with
identical left and right channels. ffmpeg does not achieve this. In
fact, there seem to be ~50 dB too much "noise".

[See mean/max_volume results below, before and after subtraction.]

Original ffmpeg:
barsnick@sunshine:/usr/new/tools/video/ffmpeg/ffmpeg-snapshot-2020-01-05 > ../ffmpeg-build-2020-01-05/ffmpeg_g -ss 10 -t 10:00 -i ~/tmp/790_pomnyun_taebaek2.mp3 -af volumedetect,"pan=mono|c0=c0-c1",volumedetect -f null -
[...]
[Parsed_volumedetect_0 @ 0xa5bb380] n_samples: 26460000
[Parsed_volumedetect_0 @ 0xa5bb380] mean_volume: -25.1 dB
[Parsed_volumedetect_0 @ 0xa5bb380] max_volume: 0.0 dB
[Parsed_volumedetect_0 @ 0xa5bb380] histogram_0db: 594
[Parsed_volumedetect_0 @ 0xa5bb380] histogram_1db: 671
[Parsed_volumedetect_0 @ 0xa5bb380] histogram_2db: 1033
[Parsed_volumedetect_0 @ 0xa5bb380] histogram_3db: 1794
[Parsed_volumedetect_0 @ 0xa5bb380] histogram_4db: 3067
[Parsed_volumedetect_0 @ 0xa5bb380] histogram_5db: 4940
[Parsed_volumedetect_0 @ 0xa5bb380] histogram_6db: 7871
[Parsed_volumedetect_0 @ 0xa5bb380] histogram_7db: 12062
[Parsed_volumedetect_2 @ 0xa5c3b40] n_samples: 13230000
[Parsed_volumedetect_2 @ 0xa5c3b40] mean_volume: -41.2 dB
[Parsed_volumedetect_2 @ 0xa5c3b40] max_volume: -15.5 dB
[Parsed_volumedetect_2 @ 0xa5c3b40] histogram_15db: 29
[Parsed_volumedetect_2 @ 0xa5c3b40] histogram_16db: 154
[Parsed_volumedetect_2 @ 0xa5c3b40] histogram_17db: 159
[Parsed_volumedetect_2 @ 0xa5c3b40] histogram_18db: 308
[Parsed_volumedetect_2 @ 0xa5c3b40] histogram_19db: 627
[Parsed_volumedetect_2 @ 0xa5c3b40] histogram_20db: 1130
[Parsed_volumedetect_2 @ 0xa5c3b40] histogram_21db: 1932
[Parsed_volumedetect_2 @ 0xa5c3b40] histogram_22db: 3433
[Parsed_volumedetect_2 @ 0xa5c3b40] histogram_23db: 5099
[Parsed_volumedetect_2 @ 0xa5c3b40] histogram_24db: 8002

I can at least confirm that other decoders achieve a better equality.
Using mpg321 with libmad-0.15.1b (to convert to wav and analyze with
ffmpeg), I get:

[Parsed_volumedetect_0 @ 0xbe50d00] n_samples: 26460000
[Parsed_volumedetect_0 @ 0xbe50d00] mean_volume: -26.9 dB
[Parsed_volumedetect_0 @ 0xbe50d00] max_volume: -0.1 dB
[Parsed_volumedetect_0 @ 0xbe50d00] histogram_0db: 186
[Parsed_volumedetect_0 @ 0xbe50d00] histogram_1db: 310
[Parsed_volumedetect_0 @ 0xbe50d00] histogram_2db: 556
[Parsed_volumedetect_0 @ 0xbe50d00] histogram_3db: 866
[Parsed_volumedetect_0 @ 0xbe50d00] histogram_4db: 1354
[Parsed_volumedetect_0 @ 0xbe50d00] histogram_5db: 2158
[Parsed_volumedetect_0 @ 0xbe50d00] histogram_6db: 3680
[Parsed_volumedetect_0 @ 0xbe50d00] histogram_7db: 6260
[Parsed_volumedetect_0 @ 0xbe50d00] histogram_8db: 8990
[Parsed_volumedetect_0 @ 0xbe50d00] histogram_9db: 13236
[Parsed_volumedetect_2 @ 0xbe4c180] n_samples: 13230000
[Parsed_volumedetect_2 @ 0xbe4c180] mean_volume: -85.5 dB
[Parsed_volumedetect_2 @ 0xbe4c180] max_volume: -62.7 dB
[Parsed_volumedetect_2 @ 0xbe4c180] histogram_62db: 1
[Parsed_volumedetect_2 @ 0xbe4c180] histogram_63db: 9
[Parsed_volumedetect_2 @ 0xbe4c180] histogram_64db: 13
[Parsed_volumedetect_2 @ 0xbe4c180] histogram_65db: 26
[Parsed_volumedetect_2 @ 0xbe4c180] histogram_66db: 37
[Parsed_volumedetect_2 @ 0xbe4c180] histogram_67db: 35
[Parsed_volumedetect_2 @ 0xbe4c180] histogram_68db: 68
[Parsed_volumedetect_2 @ 0xbe4c180] histogram_69db: 54
[Parsed_volumedetect_2 @ 0xbe4c180] histogram_70db: 73
[Parsed_volumedetect_2 @ 0xbe4c180] histogram_71db: 86
[Parsed_volumedetect_2 @ 0xbe4c180] histogram_72db: 118
[Parsed_volumedetect_2 @ 0xbe4c180] histogram_73db: 340
[Parsed_volumedetect_2 @ 0xbe4c180] histogram_74db: 9490
[Parsed_volumedetect_2 @ 0xbe4c180] histogram_75db: 0
[Parsed_volumedetect_2 @ 0xbe4c180] histogram_76db: 105618

With your patch, ffmpeg achieves this (very similar for both mp3 and
mp3float), which is even closer to identity:

[Parsed_volumedetect_0 @ 0xa6e3380] n_samples: 26460000
[Parsed_volumedetect_0 @ 0xa6e3380] mean_volume: -26.9 dB
[Parsed_volumedetect_0 @ 0xa6e3380] max_volume: -0.1 dB
[Parsed_volumedetect_0 @ 0xa6e3380] histogram_0db: 186
[Parsed_volumedetect_0 @ 0xa6e3380] histogram_1db: 310
[Parsed_volumedetect_0 @ 0xa6e3380] histogram_2db: 556
[Parsed_volumedetect_0 @ 0xa6e3380] histogram_3db: 866
[Parsed_volumedetect_0 @ 0xa6e3380] histogram_4db: 1354
[Parsed_volumedetect_0 @ 0xa6e3380] histogram_5db: 2158
[Parsed_volumedetect_0 @ 0xa6e3380] histogram_6db: 3680
[Parsed_volumedetect_0 @ 0xa6e3380] histogram_7db: 6258
[Parsed_volumedetect_0 @ 0xa6e3380] histogram_8db: 8990
[Parsed_volumedetect_0 @ 0xa6e3380] histogram_9db: 13236
[Parsed_volumedetect_2 @ 0xa6ebb40] n_samples: 13230000
[Parsed_volumedetect_2 @ 0xa6ebb40] mean_volume: -91.0 dB
[Parsed_volumedetect_2 @ 0xa6ebb40] max_volume: -59.4 dB
[Parsed_volumedetect_2 @ 0xa6ebb40] histogram_59db: 5
[Parsed_volumedetect_2 @ 0xa6ebb40] histogram_60db: 3
[Parsed_volumedetect_2 @ 0xa6ebb40] histogram_61db: 1
[Parsed_volumedetect_2 @ 0xa6ebb40] histogram_62db: 19
[Parsed_volumedetect_2 @ 0xa6ebb40] histogram_63db: 29
[Parsed_volumedetect_2 @ 0xa6ebb40] histogram_64db: 34
[Parsed_volumedetect_2 @ 0xa6ebb40] histogram_65db: 65
[Parsed_volumedetect_2 @ 0xa6ebb40] histogram_66db: 91
[Parsed_volumedetect_2 @ 0xa6ebb40] histogram_67db: 53
[Parsed_volumedetect_2 @ 0xa6ebb40] histogram_68db: 136
[Parsed_volumedetect_2 @ 0xa6ebb40] histogram_69db: 74
[Parsed_volumedetect_2 @ 0xa6ebb40] histogram_70db: 107
[Parsed_volumedetect_2 @ 0xa6ebb40] histogram_71db: 120
[Parsed_volumedetect_2 @ 0xa6ebb40] histogram_72db: 139
[Parsed_volumedetect_2 @ 0xa6ebb40] histogram_73db: 148
[Parsed_volumedetect_2 @ 0xa6ebb40] histogram_74db: 230
[Parsed_volumedetect_2 @ 0xa6ebb40] histogram_75db: 0
[Parsed_volumedetect_2 @ 0xa6ebb40] histogram_76db: 283
[Parsed_volumedetect_2 @ 0xa6ebb40] histogram_77db: 0
[Parsed_volumedetect_2 @ 0xa6ebb40] histogram_78db: 336
[Parsed_volumedetect_2 @ 0xa6ebb40] histogram_79db: 0
[Parsed_volumedetect_2 @ 0xa6ebb40] histogram_80db: 491
[Parsed_volumedetect_2 @ 0xa6ebb40] histogram_81db: 0
[Parsed_volumedetect_2 @ 0xa6ebb40] histogram_82db: 0
[Parsed_volumedetect_2 @ 0xa6ebb40] histogram_83db: 0
[Parsed_volumedetect_2 @ 0xa6ebb40] histogram_84db: 988
[Parsed_volumedetect_2 @ 0xa6ebb40] histogram_85db: 0
[Parsed_volumedetect_2 @ 0xa6ebb40] histogram_86db: 0
[Parsed_volumedetect_2 @ 0xa6ebb40] histogram_87db: 0
[Parsed_volumedetect_2 @ 0xa6ebb40] histogram_88db: 0
[Parsed_volumedetect_2 @ 0xa6ebb40] histogram_89db: 0
[Parsed_volumedetect_2 @ 0xa6ebb40] histogram_90db: 2788
[Parsed_volumedetect_2 @ 0xa6ebb40] histogram_91db: 13223860

I can't judge whether what you are doing is correct, but I can confirm
the observations.

> >From 754545f6c675aede07abd912992f9ed4a17d4ddc Mon Sep 17 00:00:00 2001
> From: Ted <t@gaudiolab.com>
> Date: Wed, 8 Jan 2020 11:53:56 +0900
> Subject: [PATCH] avfilter/mpegaudiodec_template: Fix wrong decision of
> illegal
>  intensity stereo position for MPEG-2 LSF

The patch is corrupted by newlines, inserted by your mailer. I managed
to fix that manually, but to ease review, please attach a patch created
by "git format-patch" to your e-mail (as an attachment), or use "git
send-email".

> Signed-off-by: Ted <t@gaudiolab.com>

You may want to provide a full real name, as this is what is registered
forever in the repositories.

> ---
>  libavcodec/mpegaudiodec_template.c | 14 +++++++++++---
>  1 file changed, 11 insertions(+), 3 deletions(-)

Since you are changing the decoder, you must also make sure that fate
testsuite does not detect changed results. Or if it does and such
changes are expected, the fate tests need to be adapted. (If you can't
do this, others may assist you, at least with checking. It's a bit more
of a hassle.)

It might also be worth adding such an MPEG-2 LSF sample to fate, if
those in fate don't cover this yet. (The MP§ decoder source file does
have a big TODO "test lsf / mpeg25 extensively".)

Cheers,
Moritz
Michael Niedermayer Jan. 8, 2020, 6:44 p.m. UTC | #2
On Wed, Jan 08, 2020 at 04:32:39PM +0900, Ted Lee wrote:
> Dear FFmpeg developers,
> 
> I'm glad to have a chance to contribute to FFmpeg.
> 
> Since this is the first time for me, please give feedback if I missed
> something. I will reflect on that.
> 
> *Summary:*
> 
>    - Test signal: 790_pomnyun_taebaek2.mp3
>       - MP3 file with the same left channel and right channel except for
>       the first 7 seconds and the last 10 seconds.
>       (download:
>       <https://www.dropbox.com/s/sivz18sixvhoo66/790_pomnyun_taebaek2.mp3?dl=0>
>       https://www.dropbox.com/s/sivz18sixvhoo66/790_pomnyun_taebaek2.mp3?dl=0
>       )
>    - When decoding the test signal using FFmpeg as below, a difference
>    occurs in the same part of L and R.
>       - $ffmpeg -i 790_pomnyun_taebaek2.mp3 -acodec pcm_s16le
>       790_pomnyun_taebaek2.wav
> 
> 
> Expected result (Adobe Audition or Core Audio):
> 
> Decoded PCM
> [image: image.png]
> (L-R)
> [image: image.png]
> (You can reproduce it using Adobe Audition or CoreAudio of MacOS)
> 
> Actual result (FFmpeg):
> 
> Decoded PCM
> [image: image.png]
> 
> (L-R)
> [image: image.png]
> 
> *Why it happens?:*
> 
>    - In MPEG-1 Layer 3 (MP3), the illegal position of intensity stereo is
>    defined as the maximum value of scale factor. Since intensity stereo
>    positions (scale factor of the right channel) are always allocated to 3
>    bits, intensity stereo positions are limited from 0 to 6 and 7 is treated
>    as the illegal position.
>    - In MPEG-2 LSF, the illegal position is also defined as the maximum
>    value of scale factor, but the number of bits of the scale factor is not
>    fixed. So the number of bits needs to be calculated (slen[]) and the
>    illegal stereo position should be changed as the maximum value. However, in
>    the current implementation, the illegal position is defined as fixed value
>    16.
> 
> *How to fix this?:*
> 
>    - The maximum value of scale factor (the illegal position) should be
>    store to sf_max_table[] and an illegal position is determined when the
>    scale factor is sf_max_table[](sf_max).
> 
> 
> From 754545f6c675aede07abd912992f9ed4a17d4ddc Mon Sep 17 00:00:00 2001
> From: Ted <t@gaudiolab.com>
> Date: Wed, 8 Jan 2020 11:53:56 +0900
> Subject: [PATCH] avfilter/mpegaudiodec_template: Fix wrong decision of
> illegal
>  intensity stereo position for MPEG-2 LSF
> 
> Signed-off-by: Ted <t@gaudiolab.com>
> ---
>  libavcodec/mpegaudiodec_template.c | 14 +++++++++++---
>  1 file changed, 11 insertions(+), 3 deletions(-)
> 
> diff --git a/libavcodec/mpegaudiodec_template.c
> b/libavcodec/mpegaudiodec_template.c
> index 3f1674e827..d6d7cd5ad0 100644
> --- a/libavcodec/mpegaudiodec_template.c
> +++ b/libavcodec/mpegaudiodec_template.c
> @@ -115,6 +115,7 @@ static uint16_t band_index_long[9][23];
>  static INTFLOAT is_table[2][16];
>  static INTFLOAT is_table_lsf[2][2][16];
>  static INTFLOAT csa_table[8][4];
> +static uint8_t sf_max_table[40];

[...]


>                  v1 = is_tab[0][sf];
> @@ -1523,11 +1527,15 @@ static int mp_decode_layer3(MPADecodeContext *s)
>                      n  = lsf_nsf_table[tindex2][tindex][k];
>                      sl = slen[k];
>                      if (sl) {
> -                        for (i = 0; i < n; i++)
> +                        for (i = 0; i < n; i++) {

> +                            sf_max_table[j] = (uint8_t) exp2(sl) - 1;

non constant statics do not work because there can be more than one mp3
decoder instance
also the exp2() very likely should be a simple shift

Thanks

[...]
diff mbox series

Patch

diff --git a/libavcodec/mpegaudiodec_template.c
b/libavcodec/mpegaudiodec_template.c
index 3f1674e827..d6d7cd5ad0 100644
--- a/libavcodec/mpegaudiodec_template.c
+++ b/libavcodec/mpegaudiodec_template.c
@@ -115,6 +115,7 @@  static uint16_t band_index_long[9][23];
 static INTFLOAT is_table[2][16];
 static INTFLOAT is_table_lsf[2][2][16];
 static INTFLOAT csa_table[8][4];
+static uint8_t sf_max_table[40];

 static int16_t division_tab3[1<<6 ];
 static int16_t division_tab5[1<<8 ];
@@ -1050,7 +1051,6 @@  static void compute_stereo(MPADecodeContext *s,
GranuleDef *g0, GranuleDef *g1)
             sf_max = 7;
         } else {
             is_tab = is_table_lsf[g1->scalefac_compress & 1];
-            sf_max = 16;
         }