Message ID | CAHaG5eH2KsiHLyOpzWetrO4kLDh7J3_tbhnSu3dgjxufnbwQbw@mail.gmail.com |
---|---|
State | New |
Headers | show |
Series | [FFmpeg-devel] avfilter/mpegaudiodec_template: Fix wrong decision of illegal | expand |
Context | Check | Description |
---|---|---|
andriy/ffmpeg-patchwork | fail | Failed to apply patch |
On Wed, Jan 08, 2020 at 16:32:39 +0900, Ted Lee wrote: > Dear FFmpeg developers, > I'm glad to have a chance to contribute to FFmpeg. Welcome. > *Summary:* > > - Test signal: 790_pomnyun_taebaek2.mp3 > - MP3 file with the same left channel and right channel except for > the first 7 seconds and the last 10 seconds. > (download: > <https://www.dropbox.com/s/sivz18sixvhoo66/790_pomnyun_taebaek2.mp3?dl=0> > https://www.dropbox.com/s/sivz18sixvhoo66/790_pomnyun_taebaek2.mp3?dl=0 > ) > - When decoding the test signal using FFmpeg as below, a difference > occurs in the same part of L and R. > - $ffmpeg -i 790_pomnyun_taebaek2.mp3 -acodec pcm_s16le > 790_pomnyun_taebaek2.wav > > > Expected result (Adobe Audition or Core Audio): > > Decoded PCM > [image: image.png] > (L-R) > [image: image.png] > (You can reproduce it using Adobe Audition or CoreAudio of MacOS) > > Actual result (FFmpeg): > > Decoded PCM > [image: image.png] > > (L-R) > [image: image.png] To put it in other words: You expect the monoaural section of this file to be decoded with identical left and right channels. ffmpeg does not achieve this. In fact, there seem to be ~50 dB too much "noise". [See mean/max_volume results below, before and after subtraction.] Original ffmpeg: barsnick@sunshine:/usr/new/tools/video/ffmpeg/ffmpeg-snapshot-2020-01-05 > ../ffmpeg-build-2020-01-05/ffmpeg_g -ss 10 -t 10:00 -i ~/tmp/790_pomnyun_taebaek2.mp3 -af volumedetect,"pan=mono|c0=c0-c1",volumedetect -f null - [...] [Parsed_volumedetect_0 @ 0xa5bb380] n_samples: 26460000 [Parsed_volumedetect_0 @ 0xa5bb380] mean_volume: -25.1 dB [Parsed_volumedetect_0 @ 0xa5bb380] max_volume: 0.0 dB [Parsed_volumedetect_0 @ 0xa5bb380] histogram_0db: 594 [Parsed_volumedetect_0 @ 0xa5bb380] histogram_1db: 671 [Parsed_volumedetect_0 @ 0xa5bb380] histogram_2db: 1033 [Parsed_volumedetect_0 @ 0xa5bb380] histogram_3db: 1794 [Parsed_volumedetect_0 @ 0xa5bb380] histogram_4db: 3067 [Parsed_volumedetect_0 @ 0xa5bb380] histogram_5db: 4940 [Parsed_volumedetect_0 @ 0xa5bb380] histogram_6db: 7871 [Parsed_volumedetect_0 @ 0xa5bb380] histogram_7db: 12062 [Parsed_volumedetect_2 @ 0xa5c3b40] n_samples: 13230000 [Parsed_volumedetect_2 @ 0xa5c3b40] mean_volume: -41.2 dB [Parsed_volumedetect_2 @ 0xa5c3b40] max_volume: -15.5 dB [Parsed_volumedetect_2 @ 0xa5c3b40] histogram_15db: 29 [Parsed_volumedetect_2 @ 0xa5c3b40] histogram_16db: 154 [Parsed_volumedetect_2 @ 0xa5c3b40] histogram_17db: 159 [Parsed_volumedetect_2 @ 0xa5c3b40] histogram_18db: 308 [Parsed_volumedetect_2 @ 0xa5c3b40] histogram_19db: 627 [Parsed_volumedetect_2 @ 0xa5c3b40] histogram_20db: 1130 [Parsed_volumedetect_2 @ 0xa5c3b40] histogram_21db: 1932 [Parsed_volumedetect_2 @ 0xa5c3b40] histogram_22db: 3433 [Parsed_volumedetect_2 @ 0xa5c3b40] histogram_23db: 5099 [Parsed_volumedetect_2 @ 0xa5c3b40] histogram_24db: 8002 I can at least confirm that other decoders achieve a better equality. Using mpg321 with libmad-0.15.1b (to convert to wav and analyze with ffmpeg), I get: [Parsed_volumedetect_0 @ 0xbe50d00] n_samples: 26460000 [Parsed_volumedetect_0 @ 0xbe50d00] mean_volume: -26.9 dB [Parsed_volumedetect_0 @ 0xbe50d00] max_volume: -0.1 dB [Parsed_volumedetect_0 @ 0xbe50d00] histogram_0db: 186 [Parsed_volumedetect_0 @ 0xbe50d00] histogram_1db: 310 [Parsed_volumedetect_0 @ 0xbe50d00] histogram_2db: 556 [Parsed_volumedetect_0 @ 0xbe50d00] histogram_3db: 866 [Parsed_volumedetect_0 @ 0xbe50d00] histogram_4db: 1354 [Parsed_volumedetect_0 @ 0xbe50d00] histogram_5db: 2158 [Parsed_volumedetect_0 @ 0xbe50d00] histogram_6db: 3680 [Parsed_volumedetect_0 @ 0xbe50d00] histogram_7db: 6260 [Parsed_volumedetect_0 @ 0xbe50d00] histogram_8db: 8990 [Parsed_volumedetect_0 @ 0xbe50d00] histogram_9db: 13236 [Parsed_volumedetect_2 @ 0xbe4c180] n_samples: 13230000 [Parsed_volumedetect_2 @ 0xbe4c180] mean_volume: -85.5 dB [Parsed_volumedetect_2 @ 0xbe4c180] max_volume: -62.7 dB [Parsed_volumedetect_2 @ 0xbe4c180] histogram_62db: 1 [Parsed_volumedetect_2 @ 0xbe4c180] histogram_63db: 9 [Parsed_volumedetect_2 @ 0xbe4c180] histogram_64db: 13 [Parsed_volumedetect_2 @ 0xbe4c180] histogram_65db: 26 [Parsed_volumedetect_2 @ 0xbe4c180] histogram_66db: 37 [Parsed_volumedetect_2 @ 0xbe4c180] histogram_67db: 35 [Parsed_volumedetect_2 @ 0xbe4c180] histogram_68db: 68 [Parsed_volumedetect_2 @ 0xbe4c180] histogram_69db: 54 [Parsed_volumedetect_2 @ 0xbe4c180] histogram_70db: 73 [Parsed_volumedetect_2 @ 0xbe4c180] histogram_71db: 86 [Parsed_volumedetect_2 @ 0xbe4c180] histogram_72db: 118 [Parsed_volumedetect_2 @ 0xbe4c180] histogram_73db: 340 [Parsed_volumedetect_2 @ 0xbe4c180] histogram_74db: 9490 [Parsed_volumedetect_2 @ 0xbe4c180] histogram_75db: 0 [Parsed_volumedetect_2 @ 0xbe4c180] histogram_76db: 105618 With your patch, ffmpeg achieves this (very similar for both mp3 and mp3float), which is even closer to identity: [Parsed_volumedetect_0 @ 0xa6e3380] n_samples: 26460000 [Parsed_volumedetect_0 @ 0xa6e3380] mean_volume: -26.9 dB [Parsed_volumedetect_0 @ 0xa6e3380] max_volume: -0.1 dB [Parsed_volumedetect_0 @ 0xa6e3380] histogram_0db: 186 [Parsed_volumedetect_0 @ 0xa6e3380] histogram_1db: 310 [Parsed_volumedetect_0 @ 0xa6e3380] histogram_2db: 556 [Parsed_volumedetect_0 @ 0xa6e3380] histogram_3db: 866 [Parsed_volumedetect_0 @ 0xa6e3380] histogram_4db: 1354 [Parsed_volumedetect_0 @ 0xa6e3380] histogram_5db: 2158 [Parsed_volumedetect_0 @ 0xa6e3380] histogram_6db: 3680 [Parsed_volumedetect_0 @ 0xa6e3380] histogram_7db: 6258 [Parsed_volumedetect_0 @ 0xa6e3380] histogram_8db: 8990 [Parsed_volumedetect_0 @ 0xa6e3380] histogram_9db: 13236 [Parsed_volumedetect_2 @ 0xa6ebb40] n_samples: 13230000 [Parsed_volumedetect_2 @ 0xa6ebb40] mean_volume: -91.0 dB [Parsed_volumedetect_2 @ 0xa6ebb40] max_volume: -59.4 dB [Parsed_volumedetect_2 @ 0xa6ebb40] histogram_59db: 5 [Parsed_volumedetect_2 @ 0xa6ebb40] histogram_60db: 3 [Parsed_volumedetect_2 @ 0xa6ebb40] histogram_61db: 1 [Parsed_volumedetect_2 @ 0xa6ebb40] histogram_62db: 19 [Parsed_volumedetect_2 @ 0xa6ebb40] histogram_63db: 29 [Parsed_volumedetect_2 @ 0xa6ebb40] histogram_64db: 34 [Parsed_volumedetect_2 @ 0xa6ebb40] histogram_65db: 65 [Parsed_volumedetect_2 @ 0xa6ebb40] histogram_66db: 91 [Parsed_volumedetect_2 @ 0xa6ebb40] histogram_67db: 53 [Parsed_volumedetect_2 @ 0xa6ebb40] histogram_68db: 136 [Parsed_volumedetect_2 @ 0xa6ebb40] histogram_69db: 74 [Parsed_volumedetect_2 @ 0xa6ebb40] histogram_70db: 107 [Parsed_volumedetect_2 @ 0xa6ebb40] histogram_71db: 120 [Parsed_volumedetect_2 @ 0xa6ebb40] histogram_72db: 139 [Parsed_volumedetect_2 @ 0xa6ebb40] histogram_73db: 148 [Parsed_volumedetect_2 @ 0xa6ebb40] histogram_74db: 230 [Parsed_volumedetect_2 @ 0xa6ebb40] histogram_75db: 0 [Parsed_volumedetect_2 @ 0xa6ebb40] histogram_76db: 283 [Parsed_volumedetect_2 @ 0xa6ebb40] histogram_77db: 0 [Parsed_volumedetect_2 @ 0xa6ebb40] histogram_78db: 336 [Parsed_volumedetect_2 @ 0xa6ebb40] histogram_79db: 0 [Parsed_volumedetect_2 @ 0xa6ebb40] histogram_80db: 491 [Parsed_volumedetect_2 @ 0xa6ebb40] histogram_81db: 0 [Parsed_volumedetect_2 @ 0xa6ebb40] histogram_82db: 0 [Parsed_volumedetect_2 @ 0xa6ebb40] histogram_83db: 0 [Parsed_volumedetect_2 @ 0xa6ebb40] histogram_84db: 988 [Parsed_volumedetect_2 @ 0xa6ebb40] histogram_85db: 0 [Parsed_volumedetect_2 @ 0xa6ebb40] histogram_86db: 0 [Parsed_volumedetect_2 @ 0xa6ebb40] histogram_87db: 0 [Parsed_volumedetect_2 @ 0xa6ebb40] histogram_88db: 0 [Parsed_volumedetect_2 @ 0xa6ebb40] histogram_89db: 0 [Parsed_volumedetect_2 @ 0xa6ebb40] histogram_90db: 2788 [Parsed_volumedetect_2 @ 0xa6ebb40] histogram_91db: 13223860 I can't judge whether what you are doing is correct, but I can confirm the observations. > >From 754545f6c675aede07abd912992f9ed4a17d4ddc Mon Sep 17 00:00:00 2001 > From: Ted <t@gaudiolab.com> > Date: Wed, 8 Jan 2020 11:53:56 +0900 > Subject: [PATCH] avfilter/mpegaudiodec_template: Fix wrong decision of > illegal > intensity stereo position for MPEG-2 LSF The patch is corrupted by newlines, inserted by your mailer. I managed to fix that manually, but to ease review, please attach a patch created by "git format-patch" to your e-mail (as an attachment), or use "git send-email". > Signed-off-by: Ted <t@gaudiolab.com> You may want to provide a full real name, as this is what is registered forever in the repositories. > --- > libavcodec/mpegaudiodec_template.c | 14 +++++++++++--- > 1 file changed, 11 insertions(+), 3 deletions(-) Since you are changing the decoder, you must also make sure that fate testsuite does not detect changed results. Or if it does and such changes are expected, the fate tests need to be adapted. (If you can't do this, others may assist you, at least with checking. It's a bit more of a hassle.) It might also be worth adding such an MPEG-2 LSF sample to fate, if those in fate don't cover this yet. (The MP§ decoder source file does have a big TODO "test lsf / mpeg25 extensively".) Cheers, Moritz
On Wed, Jan 08, 2020 at 04:32:39PM +0900, Ted Lee wrote: > Dear FFmpeg developers, > > I'm glad to have a chance to contribute to FFmpeg. > > Since this is the first time for me, please give feedback if I missed > something. I will reflect on that. > > *Summary:* > > - Test signal: 790_pomnyun_taebaek2.mp3 > - MP3 file with the same left channel and right channel except for > the first 7 seconds and the last 10 seconds. > (download: > <https://www.dropbox.com/s/sivz18sixvhoo66/790_pomnyun_taebaek2.mp3?dl=0> > https://www.dropbox.com/s/sivz18sixvhoo66/790_pomnyun_taebaek2.mp3?dl=0 > ) > - When decoding the test signal using FFmpeg as below, a difference > occurs in the same part of L and R. > - $ffmpeg -i 790_pomnyun_taebaek2.mp3 -acodec pcm_s16le > 790_pomnyun_taebaek2.wav > > > Expected result (Adobe Audition or Core Audio): > > Decoded PCM > [image: image.png] > (L-R) > [image: image.png] > (You can reproduce it using Adobe Audition or CoreAudio of MacOS) > > Actual result (FFmpeg): > > Decoded PCM > [image: image.png] > > (L-R) > [image: image.png] > > *Why it happens?:* > > - In MPEG-1 Layer 3 (MP3), the illegal position of intensity stereo is > defined as the maximum value of scale factor. Since intensity stereo > positions (scale factor of the right channel) are always allocated to 3 > bits, intensity stereo positions are limited from 0 to 6 and 7 is treated > as the illegal position. > - In MPEG-2 LSF, the illegal position is also defined as the maximum > value of scale factor, but the number of bits of the scale factor is not > fixed. So the number of bits needs to be calculated (slen[]) and the > illegal stereo position should be changed as the maximum value. However, in > the current implementation, the illegal position is defined as fixed value > 16. > > *How to fix this?:* > > - The maximum value of scale factor (the illegal position) should be > store to sf_max_table[] and an illegal position is determined when the > scale factor is sf_max_table[](sf_max). > > > From 754545f6c675aede07abd912992f9ed4a17d4ddc Mon Sep 17 00:00:00 2001 > From: Ted <t@gaudiolab.com> > Date: Wed, 8 Jan 2020 11:53:56 +0900 > Subject: [PATCH] avfilter/mpegaudiodec_template: Fix wrong decision of > illegal > intensity stereo position for MPEG-2 LSF > > Signed-off-by: Ted <t@gaudiolab.com> > --- > libavcodec/mpegaudiodec_template.c | 14 +++++++++++--- > 1 file changed, 11 insertions(+), 3 deletions(-) > > diff --git a/libavcodec/mpegaudiodec_template.c > b/libavcodec/mpegaudiodec_template.c > index 3f1674e827..d6d7cd5ad0 100644 > --- a/libavcodec/mpegaudiodec_template.c > +++ b/libavcodec/mpegaudiodec_template.c > @@ -115,6 +115,7 @@ static uint16_t band_index_long[9][23]; > static INTFLOAT is_table[2][16]; > static INTFLOAT is_table_lsf[2][2][16]; > static INTFLOAT csa_table[8][4]; > +static uint8_t sf_max_table[40]; [...] > v1 = is_tab[0][sf]; > @@ -1523,11 +1527,15 @@ static int mp_decode_layer3(MPADecodeContext *s) > n = lsf_nsf_table[tindex2][tindex][k]; > sl = slen[k]; > if (sl) { > - for (i = 0; i < n; i++) > + for (i = 0; i < n; i++) { > + sf_max_table[j] = (uint8_t) exp2(sl) - 1; non constant statics do not work because there can be more than one mp3 decoder instance also the exp2() very likely should be a simple shift Thanks [...]
diff --git a/libavcodec/mpegaudiodec_template.c b/libavcodec/mpegaudiodec_template.c index 3f1674e827..d6d7cd5ad0 100644 --- a/libavcodec/mpegaudiodec_template.c +++ b/libavcodec/mpegaudiodec_template.c @@ -115,6 +115,7 @@ static uint16_t band_index_long[9][23]; static INTFLOAT is_table[2][16]; static INTFLOAT is_table_lsf[2][2][16]; static INTFLOAT csa_table[8][4]; +static uint8_t sf_max_table[40]; static int16_t division_tab3[1<<6 ]; static int16_t division_tab5[1<<8 ]; @@ -1050,7 +1051,6 @@ static void compute_stereo(MPADecodeContext *s, GranuleDef *g0, GranuleDef *g1) sf_max = 7; } else { is_tab = is_table_lsf[g1->scalefac_compress & 1]; - sf_max = 16; }