diff mbox

[FFmpeg-devel,v3,3/3] aadec: fix seeking in mp3 content

Message ID 20180621165826.1714-4-ottoka@posteo.de
State New
Headers show

Commit Message

Karsten Otto June 21, 2018, 4:58 p.m. UTC
MP3 frames may not be aligned to aa chunk boundaries. After seeking,
scan for the next valid frame header. Then truncate the packet, and
also adjust timestamp information accordingly.
---
 libavformat/aadec.c | 33 ++++++++++++++++++++++++++++-----
 1 file changed, 28 insertions(+), 5 deletions(-)

Comments

Michael Niedermayer July 2, 2018, 8:59 a.m. UTC | #1
On Thu, Jun 21, 2018 at 06:58:26PM +0200, Karsten Otto wrote:
> MP3 frames may not be aligned to aa chunk boundaries. After seeking,
> scan for the next valid frame header. Then truncate the packet, and
> also adjust timestamp information accordingly.
> ---
>  libavformat/aadec.c | 33 ++++++++++++++++++++++++++++-----
>  1 file changed, 28 insertions(+), 5 deletions(-)

Please see AVSTREAM_PARSE_TIMESTAMPS

This codec specific code in demuxers should not be needed

[...]
Karsten Otto July 2, 2018, 5:21 p.m. UTC | #2
> Am 02.07.2018 um 10:59 schrieb Michael Niedermayer <michael@niedermayer.cc>:
> 
> Signierter PGP-Teil
> On Thu, Jun 21, 2018 at 06:58:26PM +0200, Karsten Otto wrote:
>> MP3 frames may not be aligned to aa chunk boundaries. After seeking,
>> scan for the next valid frame header. Then truncate the packet, and
>> also adjust timestamp information accordingly.
>> ---
>> libavformat/aadec.c | 33 ++++++++++++++++++++++++++++-----
>> 1 file changed, 28 insertions(+), 5 deletions(-)
> 
> Please see AVSTREAM_PARSE_TIMESTAMPS
> 
> This codec specific code in demuxers should not be needed
> 
I tried that before, and you are right that it takes care of timestamp adjustments.

However, after a seek the parsed packet still contains a partial frame before the
next full one. I had expected libavformat/mpegaudio_parser.c to detect this
situation and discard the fragment, but unfortunately it does not. Instead it passes
it unchanged to the codec, which plays it as a pop or even a very ugly BLEEEP -
painful while wearing headphones!

I assume this situation never arose with other (more sane) format demuxers…
So, should I rather try to patch mpegaudio_parser? Seems risky?

> [...]
> -- 
> Michael     GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB
> 
> I have never wished to cater to the crowd; for what I know they do not
> approve, and what they approve I do not know. -- Epicurus
> 
>
Michael Niedermayer July 3, 2018, 12:32 a.m. UTC | #3
On Mon, Jul 02, 2018 at 07:21:43PM +0200, Karsten Otto wrote:
> 
> > Am 02.07.2018 um 10:59 schrieb Michael Niedermayer <michael@niedermayer.cc>:
> > 
> > Signierter PGP-Teil
> > On Thu, Jun 21, 2018 at 06:58:26PM +0200, Karsten Otto wrote:
> >> MP3 frames may not be aligned to aa chunk boundaries. After seeking,
> >> scan for the next valid frame header. Then truncate the packet, and
> >> also adjust timestamp information accordingly.
> >> ---
> >> libavformat/aadec.c | 33 ++++++++++++++++++++++++++++-----
> >> 1 file changed, 28 insertions(+), 5 deletions(-)
> > 
> > Please see AVSTREAM_PARSE_TIMESTAMPS
> > 
> > This codec specific code in demuxers should not be needed
> > 
> I tried that before, and you are right that it takes care of timestamp adjustments.
> 
> However, after a seek the parsed packet still contains a partial frame before the
> next full one. I had expected libavformat/mpegaudio_parser.c to detect this
> situation and discard the fragment, but unfortunately it does not. Instead it passes
> it unchanged to the codec, which plays it as a pop or even a very ugly BLEEEP -
> painful while wearing headphones!

I think you mis-diagnose this at least somewhat
your code searches for a specific mp3 header, the parser and decoder would
accept a wider range of mp3 variants.
But both can choose points that are not mp3 frame starts. (if that is the
problem you are seeing, iam not completely sure it is)

Also is the more restricted header you search for always used or could
this be failing with some files ?

Either way, looking at the demuxer a bit deeper, theres a TOC list in the
main header which points to chunks. The one file i found has 12 such chunks
the first represents the whole file i suspect, the next largest the audio
data, another one the metadata.
I guess the remaining 2 large ones could be a cover image and an index.
I didnt really look at it, but theres a table in there with pairs of 32bit
values. the first in the file i have goes from 0 to 3 the second starts
multiple times from 0 and seems monotonly increasing and staying within
the filesize.
The sample i have does not store mp3 but it looks like this is a index
maybe offsets for packets in each of the 3 chapters.

Please look at the data, if it can be used. It would be much better than
scaning the file linearly and searching for some byte sequence to find
packet starts.


thanks

[...]
Karsten Otto July 3, 2018, 8:25 p.m. UTC | #4
TL;DR: I will drop patch 3/3, may rather spend some time investigating why
"ff ee 47 9d“ passes the mp3 header parser. Also, the aa file "index" cannot
be used for frame or chapter detection, unfortunately.

More details inline below.
 
> Am 03.07.2018 um 02:32 schrieb Michael Niedermayer <michael@niedermayer.cc>:
> 
> Signierter PGP-Teil
> On Mon, Jul 02, 2018 at 07:21:43PM +0200, Karsten Otto wrote:
>> 
>>> Am 02.07.2018 um 10:59 schrieb Michael Niedermayer <michael@niedermayer.cc>:
>>> 
>>> Signierter PGP-Teil
>>> On Thu, Jun 21, 2018 at 06:58:26PM +0200, Karsten Otto wrote:
>>>> MP3 frames may not be aligned to aa chunk boundaries. After seeking,
>>>> scan for the next valid frame header. Then truncate the packet, and
>>>> also adjust timestamp information accordingly.
>>>> ---
>>>> libavformat/aadec.c | 33 ++++++++++++++++++++++++++++-----
>>>> 1 file changed, 28 insertions(+), 5 deletions(-)
>>> 
>>> Please see AVSTREAM_PARSE_TIMESTAMPS
>>> 
>>> This codec specific code in demuxers should not be needed
>>> 
>> I tried that before, and you are right that it takes care of timestamp adjustments.
>> 
>> However, after a seek the parsed packet still contains a partial frame before the
>> next full one. I had expected libavformat/mpegaudio_parser.c to detect this
>> situation and discard the fragment, but unfortunately it does not. Instead it passes
>> it unchanged to the codec, which plays it as a pop or even a very ugly BLEEEP -
>> painful while wearing headphones!
> 
> I think you mis-diagnose this at least somewhat
> your code searches for a specific mp3 header, the parser and decoder would
> accept a wider range of mp3 variants.
> But both can choose points that are not mp3 frame starts. (if that is the
> problem you are seeing, iam not completely sure it is)
> 
It took a closer look at what happens when I hear a BLEEP: The packet begins
with a partial frame, starting with the byte sequence "ff ee 47 9d“. Unfortunately,
the mp3 parser indeed accepts this as a valid mp3 header, causing the noise.
By looking for the more restricted header, my patch finds the real next frame at
offset 78.

BTW: Should this sequence actually pass? AFAIK 01 is not a valid MPEG audio
version ID?

> Also is the more restricted header you search for always used or could
> this be failing with some files ?
> 
Good question. So far, all mp3 aa files I tested with matched the format (MPEG 2
Layer III at 32 kbps and 22kHz). I doubt there are other variants, but can’t be sure.

> Either way, looking at the demuxer a bit deeper, theres a TOC list in the
> main header which points to chunks. The one file i found has 12 such chunks
> the first represents the whole file i suspect, the next largest the audio
> data, another one the metadata.
> I guess the remaining 2 large ones could be a cover image and an index.
Correct, seems like all aa files have the TOC, but its entries can be in a different
order in each file. I guess thats why the original aadec.c implementation just
looks for the largest chunk to play.

> I didnt really look at it, but theres a table in there with pairs of 32bit
> values. the first in the file i have goes from 0 to 3 the second starts
> multiple times from 0 and seems monotonly increasing and staying within
> the filesize.
> The sample i have does not store mp3 but it looks like this is a index
> maybe offsets for packets in each of the 3 chapters.
> 
> Please look at the data, if it can be used. It would be much better than
> scaning the file linearly and searching for some byte sequence to find
> packet starts.
> 
Short answer: Sorry, it is not possible to derive frame offsets nor chapter
offsets from the index.

Long answer:
All offsets in the index are the same, and matching the "codec_second_size"
= crypto chunk size, roughly one second of audio:
- 1045 for format 2 (SIPR 8kbps)
- 2000 for format 3 (SIPR 16kbps)
- 3982 for format 4 (MP3 32kbps)
This is different from the respective frame size, which is 19, 20, and 104/105
respectively. For MP3 the frame size and chunk size do not even align
(I assume this was a late retrofit to the format), causing the problems mentioned
above. So, not usable to derive codec frame offsets.

Furthermore, each index starts at offset 0, so they are relative to the start of
their chapter. But they do not indicate the chunk size, though the last chunk
of a chapter is usually shorter then the full size. So, not usable to derive
chapter offsets.

I assume this index could be used to shuffle the file contents as a further
DRM/obfuscation measure. But I have never seen this so far.

> 
> thanks
> 
Cheers, Karsten

> [...]
> -- 
> Michael     GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB
> 
> Old school: Use the lowest level language in which you can solve the problem
>            conveniently.
> New school: Use the highest level language in which the latest supercomputer
>            can solve the problem without the user falling asleep waiting.
> 
>
Carl Eugen Hoyos July 3, 2018, 8:29 p.m. UTC | #5
2018-07-03 22:25 GMT+02:00, Karsten Otto <ottoka@posteo.de>:

> It took a closer look at what happens when I hear a BLEEP: The packet begins
> with a partial frame, starting with the byte sequence "ff ee 47 9d“.
> Unfortunately,
> the mp3 parser indeed accepts this as a valid mp3 header, causing the noise.
> By looking for the more restricted header, my patch finds the real next
> frame at
> offset 78.
>
> BTW: Should this sequence actually pass? AFAIK 01 is not a valid MPEG audio
> version ID?

The parser is not supposed to drop data.

Carl Eugen
Karsten Otto July 3, 2018, 9:06 p.m. UTC | #6
> Am 03.07.2018 um 22:29 schrieb Carl Eugen Hoyos <ceffmpeg@gmail.com>:
> 
> 2018-07-03 22:25 GMT+02:00, Karsten Otto <ottoka@posteo.de>:
> 
>> It took a closer look at what happens when I hear a BLEEP: The packet begins
>> with a partial frame, starting with the byte sequence "ff ee 47 9d“.
>> Unfortunately,
>> the mp3 parser indeed accepts this as a valid mp3 header, causing the noise.
>> By looking for the more restricted header, my patch finds the real next
>> frame at
>> offset 78.
>> 
>> BTW: Should this sequence actually pass? AFAIK 01 is not a valid MPEG audio
>> version ID?
> 
> The parser is not supposed to drop data.
> 
Does this mean the demuxer is responsible for finding the proper frame start? 
In that case I should keep my patch 3/3. But Michael Niedermayer indicated
this is bad because the demuxer should not be concerned with codec specific
things. I am confused :-/

> Carl Eugen
> _______________________________________________
> ffmpeg-devel mailing list
> ffmpeg-devel@ffmpeg.org
> http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
Michael Niedermayer July 4, 2018, 1:26 a.m. UTC | #7
On Tue, Jul 03, 2018 at 10:25:36PM +0200, Karsten Otto wrote:
> TL;DR: I will drop patch 3/3, may rather spend some time investigating why
> "ff ee 47 9d“ passes the mp3 header parser. Also, the aa file "index" cannot
> be used for frame or chapter detection, unfortunately.
> 
> More details inline below.
>  
> > Am 03.07.2018 um 02:32 schrieb Michael Niedermayer <michael@niedermayer.cc>:
> > 
> > Signierter PGP-Teil
> > On Mon, Jul 02, 2018 at 07:21:43PM +0200, Karsten Otto wrote:
> >> 
> >>> Am 02.07.2018 um 10:59 schrieb Michael Niedermayer <michael@niedermayer.cc>:
> >>> 
> >>> Signierter PGP-Teil
> >>> On Thu, Jun 21, 2018 at 06:58:26PM +0200, Karsten Otto wrote:
> >>>> MP3 frames may not be aligned to aa chunk boundaries. After seeking,
> >>>> scan for the next valid frame header. Then truncate the packet, and
> >>>> also adjust timestamp information accordingly.
> >>>> ---
> >>>> libavformat/aadec.c | 33 ++++++++++++++++++++++++++++-----
> >>>> 1 file changed, 28 insertions(+), 5 deletions(-)
> >>> 
> >>> Please see AVSTREAM_PARSE_TIMESTAMPS
> >>> 
> >>> This codec specific code in demuxers should not be needed
> >>> 
> >> I tried that before, and you are right that it takes care of timestamp adjustments.
> >> 
> >> However, after a seek the parsed packet still contains a partial frame before the
> >> next full one. I had expected libavformat/mpegaudio_parser.c to detect this
> >> situation and discard the fragment, but unfortunately it does not. Instead it passes
> >> it unchanged to the codec, which plays it as a pop or even a very ugly BLEEEP -
> >> painful while wearing headphones!
> > 
> > I think you mis-diagnose this at least somewhat
> > your code searches for a specific mp3 header, the parser and decoder would
> > accept a wider range of mp3 variants.
> > But both can choose points that are not mp3 frame starts. (if that is the
> > problem you are seeing, iam not completely sure it is)
> > 
> It took a closer look at what happens when I hear a BLEEP: The packet begins
> with a partial frame, starting with the byte sequence "ff ee 47 9d“. Unfortunately,
> the mp3 parser indeed accepts this as a valid mp3 header, causing the noise.
> By looking for the more restricted header, my patch finds the real next frame at
> offset 78.
> 
> BTW: Should this sequence actually pass? AFAIK 01 is not a valid MPEG audio
> version ID?
> 
> > Also is the more restricted header you search for always used or could
> > this be failing with some files ?
> > 
> Good question. So far, all mp3 aa files I tested with matched the format (MPEG 2
> Layer III at 32 kbps and 22kHz). I doubt there are other variants, but can’t be sure.
> 
> > Either way, looking at the demuxer a bit deeper, theres a TOC list in the
> > main header which points to chunks. The one file i found has 12 such chunks
> > the first represents the whole file i suspect, the next largest the audio
> > data, another one the metadata.
> > I guess the remaining 2 large ones could be a cover image and an index.
> Correct, seems like all aa files have the TOC, but its entries can be in a different
> order in each file. I guess thats why the original aadec.c implementation just
> looks for the largest chunk to play.
> 
> > I didnt really look at it, but theres a table in there with pairs of 32bit
> > values. the first in the file i have goes from 0 to 3 the second starts
> > multiple times from 0 and seems monotonly increasing and staying within
> > the filesize.
> > The sample i have does not store mp3 but it looks like this is a index
> > maybe offsets for packets in each of the 3 chapters.
> > 
> > Please look at the data, if it can be used. It would be much better than
> > scaning the file linearly and searching for some byte sequence to find
> > packet starts.
> > 
> Short answer: Sorry, it is not possible to derive frame offsets nor chapter
> offsets from the index.
> 

> Long answer:
> All offsets in the index are the same, and matching the "codec_second_size"
> = crypto chunk size, roughly one second of audio:
> - 1045 for format 2 (SIPR 8kbps)
> - 2000 for format 3 (SIPR 16kbps)
> - 3982 for format 4 (MP3 32kbps)
> This is different from the respective frame size, which is 19, 20, and 104/105

The first 2 are exact multiples of the frame size.

I dont have a sample for the mp3 case so i cannot check but are you sure
the actually writen numbers dont match up multiples of mp3 frames ?

If not the question that remains is how does the official code seek in this?
It seems a bit odd that they would get the design of the index wrong but then
implement demuxer side searching for a mp3 header not noticing that this is
not working fully reliable.
[This would imply that while they implemented the demuxer they would have
 been aware that their index doesnt work, why would they not fix it at this
 point ...]
Its hard to imagine they would not notice this bug with the index at all,
not that this is impossible or anything


[...]
Karsten Otto July 4, 2018, 7:32 a.m. UTC | #8
> Am 04.07.2018 um 03:26 schrieb Michael Niedermayer <michael@niedermayer.cc>:
> 
> Signierter PGP-Teil
> On Tue, Jul 03, 2018 at 10:25:36PM +0200, Karsten Otto wrote:
>> TL;DR: I will drop patch 3/3, may rather spend some time investigating why
>> "ff ee 47 9d“ passes the mp3 header parser. Also, the aa file "index" cannot
>> be used for frame or chapter detection, unfortunately.
>> 
>> More details inline below.
>> 
>>> Am 03.07.2018 um 02:32 schrieb Michael Niedermayer <michael@niedermayer.cc>:
>>> 
>>> Signierter PGP-Teil
>>> On Mon, Jul 02, 2018 at 07:21:43PM +0200, Karsten Otto wrote:
>>>> 
>>>>> Am 02.07.2018 um 10:59 schrieb Michael Niedermayer <michael@niedermayer.cc>:
>>>>> 
>>>>> Signierter PGP-Teil
>>>>> On Thu, Jun 21, 2018 at 06:58:26PM +0200, Karsten Otto wrote:
>>>>>> MP3 frames may not be aligned to aa chunk boundaries. After seeking,
>>>>>> scan for the next valid frame header. Then truncate the packet, and
>>>>>> also adjust timestamp information accordingly.
>>>>>> ---
>>>>>> libavformat/aadec.c | 33 ++++++++++++++++++++++++++++-----
>>>>>> 1 file changed, 28 insertions(+), 5 deletions(-)
>>>>> 
>>>>> Please see AVSTREAM_PARSE_TIMESTAMPS
>>>>> 
>>>>> This codec specific code in demuxers should not be needed
>>>>> 
>>>> I tried that before, and you are right that it takes care of timestamp adjustments.
>>>> 
>>>> However, after a seek the parsed packet still contains a partial frame before the
>>>> next full one. I had expected libavformat/mpegaudio_parser.c to detect this
>>>> situation and discard the fragment, but unfortunately it does not. Instead it passes
>>>> it unchanged to the codec, which plays it as a pop or even a very ugly BLEEEP -
>>>> painful while wearing headphones!
>>> 
>>> I think you mis-diagnose this at least somewhat
>>> your code searches for a specific mp3 header, the parser and decoder would
>>> accept a wider range of mp3 variants.
>>> But both can choose points that are not mp3 frame starts. (if that is the
>>> problem you are seeing, iam not completely sure it is)
>>> 
>> It took a closer look at what happens when I hear a BLEEP: The packet begins
>> with a partial frame, starting with the byte sequence "ff ee 47 9d“. Unfortunately,
>> the mp3 parser indeed accepts this as a valid mp3 header, causing the noise.
>> By looking for the more restricted header, my patch finds the real next frame at
>> offset 78.
>> 
>> BTW: Should this sequence actually pass? AFAIK 01 is not a valid MPEG audio
>> version ID?
>> 
>>> Also is the more restricted header you search for always used or could
>>> this be failing with some files ?
>>> 
>> Good question. So far, all mp3 aa files I tested with matched the format (MPEG 2
>> Layer III at 32 kbps and 22kHz). I doubt there are other variants, but can’t be sure.
>> 
>>> Either way, looking at the demuxer a bit deeper, theres a TOC list in the
>>> main header which points to chunks. The one file i found has 12 such chunks
>>> the first represents the whole file i suspect, the next largest the audio
>>> data, another one the metadata.
>>> I guess the remaining 2 large ones could be a cover image and an index.
>> Correct, seems like all aa files have the TOC, but its entries can be in a different
>> order in each file. I guess thats why the original aadec.c implementation just
>> looks for the largest chunk to play.
>> 
>>> I didnt really look at it, but theres a table in there with pairs of 32bit
>>> values. the first in the file i have goes from 0 to 3 the second starts
>>> multiple times from 0 and seems monotonly increasing and staying within
>>> the filesize.
>>> The sample i have does not store mp3 but it looks like this is a index
>>> maybe offsets for packets in each of the 3 chapters.
>>> 
>>> Please look at the data, if it can be used. It would be much better than
>>> scaning the file linearly and searching for some byte sequence to find
>>> packet starts.
>>> 
>> Short answer: Sorry, it is not possible to derive frame offsets nor chapter
>> offsets from the index.
>> 
> 
>> Long answer:
>> All offsets in the index are the same, and matching the "codec_second_size"
>> = crypto chunk size, roughly one second of audio:
>> - 1045 for format 2 (SIPR 8kbps)
>> - 2000 for format 3 (SIPR 16kbps)
>> - 3982 for format 4 (MP3 32kbps)
>> This is different from the respective frame size, which is 19, 20, and 104/105
> 
> The first 2 are exact multiples of the frame size.
> 
> I dont have a sample for the mp3 case so i cannot check but are you sure
> the actually writen numbers dont match up multiples of mp3 frames ?
> 
Yes, I have tested both with some old aa books I still had on disk, and some I
bought more recently. Using -fdebug ts, I can clearly see a packet size of
104 alternating with 105, as I would expect from mp3. Furthermore, using my
specialized header detection on every chunk, I see that the actual mp3 frame
header indeed starts at offset 0 in chunk 0. But then it changes for every
subsequent chunk, I have seen a maximum of 103, as to be expected.

> If not the question that remains is how does the official code seek in this?
> It seems a bit odd that they would get the design of the index wrong but then
> implement demuxer side searching for a mp3 header not noticing that this is
> not working fully reliable.
> [This would imply that while they implemented the demuxer they would have
> been aware that their index doesnt work, why would they not fix it at this
> point ...]
> Its hard to imagine they would not notice this bug with the index at all,
> not that this is impossible or anything
> 

Agreed. Here is how I think this is actually supposed to work: Every encrypted
chunk contains roughly one second of audio. So, if you want to seek to a
specific second, you can just look up an index entry. This tells you the chapter
number and the offset in the chapter. Now, if you already know the chapter
offsets, you can jump right to the correct position in the audio section. (I assume
the chapter offsets are somewhere else in the aa format, possibly encrypted.)
Regarding the odd mp3 frame offsets, a more restricted header detection will
take care of that, as I do in patch 3/3.

Of course, given each chunk is the same size, you don't really need an index,
but you can calculate the offset instead. This is what I am doing in patch 2/3.
The index only makes sense if the offsets were not linear, but shuffled around.
Maybe they did that in the original aa format 1 to scramble the audio, and kept
the index for backward compatibility. This is pure speculation though, as I have
no information at all about format 1.

So, the whole thing boils down to knowing the chapter offsets. From the
current aadec.c implementation I know that this information exist inline in the
audio section, so that's what I am using in patch 2/3. As I said, the chapter
information might also exist somewhere else in the format, but finding it would
require more reverse engineering, which I am not comfortable with.

Yes, my current patch requires a couple dozen seek operations on the file,
which is not optimal. But I don't think this is too much of a performance hit, as
nowadays everybody has an OS with a sturdy disk block cache, or read directly
from RAM storage anyway. It works at least, so I can use my old aa books on
my Linux phone now :-)

Cheers, Karsten
Michael Niedermayer July 4, 2018, 9:54 p.m. UTC | #9
On Wed, Jul 04, 2018 at 09:32:32AM +0200, Karsten Otto wrote:
> 
> > Am 04.07.2018 um 03:26 schrieb Michael Niedermayer <michael@niedermayer.cc>:
> > 
> > Signierter PGP-Teil
> > On Tue, Jul 03, 2018 at 10:25:36PM +0200, Karsten Otto wrote:
> >> TL;DR: I will drop patch 3/3, may rather spend some time investigating why
> >> "ff ee 47 9d“ passes the mp3 header parser. Also, the aa file "index" cannot
> >> be used for frame or chapter detection, unfortunately.
> >> 
> >> More details inline below.
> >> 
> >>> Am 03.07.2018 um 02:32 schrieb Michael Niedermayer <michael@niedermayer.cc>:
> >>> 
> >>> Signierter PGP-Teil
> >>> On Mon, Jul 02, 2018 at 07:21:43PM +0200, Karsten Otto wrote:
> >>>> 
> >>>>> Am 02.07.2018 um 10:59 schrieb Michael Niedermayer <michael@niedermayer.cc>:
> >>>>> 
> >>>>> Signierter PGP-Teil
> >>>>> On Thu, Jun 21, 2018 at 06:58:26PM +0200, Karsten Otto wrote:
> >>>>>> MP3 frames may not be aligned to aa chunk boundaries. After seeking,
> >>>>>> scan for the next valid frame header. Then truncate the packet, and
> >>>>>> also adjust timestamp information accordingly.
> >>>>>> ---
> >>>>>> libavformat/aadec.c | 33 ++++++++++++++++++++++++++++-----
> >>>>>> 1 file changed, 28 insertions(+), 5 deletions(-)
> >>>>> 
> >>>>> Please see AVSTREAM_PARSE_TIMESTAMPS
> >>>>> 
> >>>>> This codec specific code in demuxers should not be needed
> >>>>> 
> >>>> I tried that before, and you are right that it takes care of timestamp adjustments.
> >>>> 
> >>>> However, after a seek the parsed packet still contains a partial frame before the
> >>>> next full one. I had expected libavformat/mpegaudio_parser.c to detect this
> >>>> situation and discard the fragment, but unfortunately it does not. Instead it passes
> >>>> it unchanged to the codec, which plays it as a pop or even a very ugly BLEEEP -
> >>>> painful while wearing headphones!
> >>> 
> >>> I think you mis-diagnose this at least somewhat
> >>> your code searches for a specific mp3 header, the parser and decoder would
> >>> accept a wider range of mp3 variants.
> >>> But both can choose points that are not mp3 frame starts. (if that is the
> >>> problem you are seeing, iam not completely sure it is)
> >>> 
> >> It took a closer look at what happens when I hear a BLEEP: The packet begins
> >> with a partial frame, starting with the byte sequence "ff ee 47 9d“. Unfortunately,
> >> the mp3 parser indeed accepts this as a valid mp3 header, causing the noise.
> >> By looking for the more restricted header, my patch finds the real next frame at
> >> offset 78.
> >> 
> >> BTW: Should this sequence actually pass? AFAIK 01 is not a valid MPEG audio
> >> version ID?
> >> 
> >>> Also is the more restricted header you search for always used or could
> >>> this be failing with some files ?
> >>> 
> >> Good question. So far, all mp3 aa files I tested with matched the format (MPEG 2
> >> Layer III at 32 kbps and 22kHz). I doubt there are other variants, but can’t be sure.
> >> 
> >>> Either way, looking at the demuxer a bit deeper, theres a TOC list in the
> >>> main header which points to chunks. The one file i found has 12 such chunks
> >>> the first represents the whole file i suspect, the next largest the audio
> >>> data, another one the metadata.
> >>> I guess the remaining 2 large ones could be a cover image and an index.
> >> Correct, seems like all aa files have the TOC, but its entries can be in a different
> >> order in each file. I guess thats why the original aadec.c implementation just
> >> looks for the largest chunk to play.
> >> 
> >>> I didnt really look at it, but theres a table in there with pairs of 32bit
> >>> values. the first in the file i have goes from 0 to 3 the second starts
> >>> multiple times from 0 and seems monotonly increasing and staying within
> >>> the filesize.
> >>> The sample i have does not store mp3 but it looks like this is a index
> >>> maybe offsets for packets in each of the 3 chapters.
> >>> 
> >>> Please look at the data, if it can be used. It would be much better than
> >>> scaning the file linearly and searching for some byte sequence to find
> >>> packet starts.
> >>> 
> >> Short answer: Sorry, it is not possible to derive frame offsets nor chapter
> >> offsets from the index.
> >> 
> > 
> >> Long answer:
> >> All offsets in the index are the same, and matching the "codec_second_size"
> >> = crypto chunk size, roughly one second of audio:
> >> - 1045 for format 2 (SIPR 8kbps)
> >> - 2000 for format 3 (SIPR 16kbps)
> >> - 3982 for format 4 (MP3 32kbps)
> >> This is different from the respective frame size, which is 19, 20, and 104/105
> > 
> > The first 2 are exact multiples of the frame size.
> > 
> > I dont have a sample for the mp3 case so i cannot check but are you sure
> > the actually writen numbers dont match up multiples of mp3 frames ?
> > 
> Yes, I have tested both with some old aa books I still had on disk, and some I
> bought more recently. Using -fdebug ts, I can clearly see a packet size of
> 104 alternating with 105, as I would expect from mp3. Furthermore, using my
> specialized header detection on every chunk, I see that the actual mp3 frame
> header indeed starts at offset 0 in chunk 0. But then it changes for every
> subsequent chunk, I have seen a maximum of 103, as to be expected.

Is there a public "aa" file with mp3 ?
Id like to look at this myself
also we should add such a file to fatesamples and add some tests with it



> 
> > If not the question that remains is how does the official code seek in this?
> > It seems a bit odd that they would get the design of the index wrong but then
> > implement demuxer side searching for a mp3 header not noticing that this is
> > not working fully reliable.
> > [This would imply that while they implemented the demuxer they would have
> > been aware that their index doesnt work, why would they not fix it at this
> > point ...]
> > Its hard to imagine they would not notice this bug with the index at all,
> > not that this is impossible or anything
> > 
> 
> Agreed. Here is how I think this is actually supposed to work: Every encrypted
> chunk contains roughly one second of audio. So, if you want to seek to a
> specific second, you can just look up an index entry. This tells you the chapter
> number and the offset in the chapter. Now, if you already know the chapter
> offsets, you can jump right to the correct position in the audio section. (I assume
> the chapter offsets are somewhere else in the aa format, possibly encrypted.)
> Regarding the odd mp3 frame offsets, a more restricted header detection will
> take care of that, as I do in patch 3/3.

The more restricted header detection will fail less often. But it would be
unexpected that it never fails. Theres not really anything that prevents
these specific sequence of bits to occur in places other than the header

All this makes it a bit hard to belive for me that this is how the format is
intended to work.


> 
> Of course, given each chunk is the same size, you don't really need an index,
> but you can calculate the offset instead. This is what I am doing in patch 2/3.
> The index only makes sense if the offsets were not linear, but shuffled around.
> Maybe they did that in the original aa format 1 to scramble the audio, and kept
> the index for backward compatibility. This is pure speculation though, as I have
> no information at all about format 1.

or variable bitrate


> 
> So, the whole thing boils down to knowing the chapter offsets. From the
> current aadec.c implementation I know that this information exist inline in the
> audio section, so that's what I am using in patch 2/3. As I said, the chapter
> information might also exist somewhere else in the format, but finding it would
> require more reverse engineering, which I am not comfortable with.
> 
> Yes, my current patch requires a couple dozen seek operations on the file,
> which is not optimal. But I don't think this is too much of a performance hit, as
> nowadays everybody has an OS with a sturdy disk block cache, or read directly
> from RAM storage anyway. It works at least, so I can use my old aa books on
> my Linux phone now :-)

people do also play stuff remotely and we had bug reports about things like
http doing pretty poorly with seeks.


Thanks


[...]
Karsten Otto July 6, 2018, 8:49 a.m. UTC | #10
> Am 04.07.2018 um 23:54 schrieb Michael Niedermayer <michael@niedermayer.cc>:
> 
> Signierter PGP-Teil
> On Wed, Jul 04, 2018 at 09:32:32AM +0200, Karsten Otto wrote:
>> 
>>> Am 04.07.2018 um 03:26 schrieb Michael Niedermayer <michael@niedermayer.cc>:
>>> 
>>> Signierter PGP-Teil
>>> On Tue, Jul 03, 2018 at 10:25:36PM +0200, Karsten Otto wrote:
>>>> TL;DR: I will drop patch 3/3, may rather spend some time investigating why
>>>> "ff ee 47 9d“ passes the mp3 header parser. Also, the aa file "index" cannot
>>>> be used for frame or chapter detection, unfortunately.
>>>> 
>>>> More details inline below.
>>>> 
>>>>> Am 03.07.2018 um 02:32 schrieb Michael Niedermayer <michael@niedermayer.cc>:
>>>>> 
>>>>> Signierter PGP-Teil
>>>>> On Mon, Jul 02, 2018 at 07:21:43PM +0200, Karsten Otto wrote:
>>>>>> 
>>>>>>> Am 02.07.2018 um 10:59 schrieb Michael Niedermayer <michael@niedermayer.cc>:
>>>>>>> 
>>>>>>> Signierter PGP-Teil
>>>>>>> On Thu, Jun 21, 2018 at 06:58:26PM +0200, Karsten Otto wrote:
>>>>>>>> MP3 frames may not be aligned to aa chunk boundaries. After seeking,
>>>>>>>> scan for the next valid frame header. Then truncate the packet, and
>>>>>>>> also adjust timestamp information accordingly.
>>>>>>>> ---
>>>>>>>> libavformat/aadec.c | 33 ++++++++++++++++++++++++++++-----
>>>>>>>> 1 file changed, 28 insertions(+), 5 deletions(-)
>>>>>>> 
>>>>>>> Please see AVSTREAM_PARSE_TIMESTAMPS
>>>>>>> 
>>>>>>> This codec specific code in demuxers should not be needed
>>>>>>> 
>>>>>> I tried that before, and you are right that it takes care of timestamp adjustments.
>>>>>> 
>>>>>> However, after a seek the parsed packet still contains a partial frame before the
>>>>>> next full one. I had expected libavformat/mpegaudio_parser.c to detect this
>>>>>> situation and discard the fragment, but unfortunately it does not. Instead it passes
>>>>>> it unchanged to the codec, which plays it as a pop or even a very ugly BLEEEP -
>>>>>> painful while wearing headphones!
>>>>> 
>>>>> I think you mis-diagnose this at least somewhat
>>>>> your code searches for a specific mp3 header, the parser and decoder would
>>>>> accept a wider range of mp3 variants.
>>>>> But both can choose points that are not mp3 frame starts. (if that is the
>>>>> problem you are seeing, iam not completely sure it is)
>>>>> 
>>>> It took a closer look at what happens when I hear a BLEEP: The packet begins
>>>> with a partial frame, starting with the byte sequence "ff ee 47 9d“. Unfortunately,
>>>> the mp3 parser indeed accepts this as a valid mp3 header, causing the noise.
>>>> By looking for the more restricted header, my patch finds the real next frame at
>>>> offset 78.
>>>> 
>>>> BTW: Should this sequence actually pass? AFAIK 01 is not a valid MPEG audio
>>>> version ID?
>>>> 
>>>>> Also is the more restricted header you search for always used or could
>>>>> this be failing with some files ?
>>>>> 
>>>> Good question. So far, all mp3 aa files I tested with matched the format (MPEG 2
>>>> Layer III at 32 kbps and 22kHz). I doubt there are other variants, but can’t be sure.
>>>> 
>>>>> Either way, looking at the demuxer a bit deeper, theres a TOC list in the
>>>>> main header which points to chunks. The one file i found has 12 such chunks
>>>>> the first represents the whole file i suspect, the next largest the audio
>>>>> data, another one the metadata.
>>>>> I guess the remaining 2 large ones could be a cover image and an index.
>>>> Correct, seems like all aa files have the TOC, but its entries can be in a different
>>>> order in each file. I guess thats why the original aadec.c implementation just
>>>> looks for the largest chunk to play.
>>>> 
>>>>> I didnt really look at it, but theres a table in there with pairs of 32bit
>>>>> values. the first in the file i have goes from 0 to 3 the second starts
>>>>> multiple times from 0 and seems monotonly increasing and staying within
>>>>> the filesize.
>>>>> The sample i have does not store mp3 but it looks like this is a index
>>>>> maybe offsets for packets in each of the 3 chapters.
>>>>> 
>>>>> Please look at the data, if it can be used. It would be much better than
>>>>> scaning the file linearly and searching for some byte sequence to find
>>>>> packet starts.
>>>>> 
>>>> Short answer: Sorry, it is not possible to derive frame offsets nor chapter
>>>> offsets from the index.
>>>> 
>>> 
>>>> Long answer:
>>>> All offsets in the index are the same, and matching the "codec_second_size"
>>>> = crypto chunk size, roughly one second of audio:
>>>> - 1045 for format 2 (SIPR 8kbps)
>>>> - 2000 for format 3 (SIPR 16kbps)
>>>> - 3982 for format 4 (MP3 32kbps)
>>>> This is different from the respective frame size, which is 19, 20, and 104/105
>>> 
>>> The first 2 are exact multiples of the frame size.
>>> 
>>> I dont have a sample for the mp3 case so i cannot check but are you sure
>>> the actually writen numbers dont match up multiples of mp3 frames ?
>>> 
>> Yes, I have tested both with some old aa books I still had on disk, and some I
>> bought more recently. Using -fdebug ts, I can clearly see a packet size of
>> 104 alternating with 105, as I would expect from mp3. Furthermore, using my
>> specialized header detection on every chunk, I see that the actual mp3 frame
>> header indeed starts at offset 0 in chunk 0. But then it changes for every
>> subsequent chunk, I have seen a maximum of 103, as to be expected.
> 
> Is there a public "aa" file with mp3 ?
> Id like to look at this myself
> also we should add such a file to fatesamples and add some tests with it
> 
Sadly no, at least not legally. aa files are always personalized to a user account.
Audible offers some free of charge titles, you can get them without a subscription,
if you sign up / use an Amazon account. To get the aa version, choose
"normal quality"; otherwise you get the successor format aax, which is mp4.

> 
> 
>> 
>>> If not the question that remains is how does the official code seek in this?
>>> It seems a bit odd that they would get the design of the index wrong but then
>>> implement demuxer side searching for a mp3 header not noticing that this is
>>> not working fully reliable.
>>> [This would imply that while they implemented the demuxer they would have
>>> been aware that their index doesnt work, why would they not fix it at this
>>> point ...]
>>> Its hard to imagine they would not notice this bug with the index at all,
>>> not that this is impossible or anything
>>> 
>> 
>> Agreed. Here is how I think this is actually supposed to work: Every encrypted
>> chunk contains roughly one second of audio. So, if you want to seek to a
>> specific second, you can just look up an index entry. This tells you the chapter
>> number and the offset in the chapter. Now, if you already know the chapter
>> offsets, you can jump right to the correct position in the audio section. (I assume
>> the chapter offsets are somewhere else in the aa format, possibly encrypted.)
>> Regarding the odd mp3 frame offsets, a more restricted header detection will
>> take care of that, as I do in patch 3/3.
> 
> The more restricted header detection will fail less often. But it would be
> unexpected that it never fails. Theres not really anything that prevents
> these specific sequence of bits to occur in places other than the header
> 
> All this makes it a bit hard to belive for me that this is how the format is
> intended to work.
> 
As I said, I think mp3 in aa is more a retrofit than intentional design. It works if
they always use the exact same encoding (as they do from my observations).
Then they can used an *exact* header match to sync, and the chance of the
same 32 bit pattern occurring unintentionally in audio content is highly unlikely.

> 
>> 
>> Of course, given each chunk is the same size, you don't really need an index,
>> but you can calculate the offset instead. This is what I am doing in patch 2/3.
>> The index only makes sense if the offsets were not linear, but shuffled around.
>> Maybe they did that in the original aa format 1 to scramble the audio, and kept
>> the index for backward compatibility. This is pure speculation though, as I have
>> no information at all about format 1.
> 
> or variable bitrate
> 
Keep in mind this is an index of crypto blocks = seconds, not mp3 frames. This
would not work very well with VBR.

> 
>> 
>> So, the whole thing boils down to knowing the chapter offsets. From the
>> current aadec.c implementation I know that this information exist inline in the
>> audio section, so that's what I am using in patch 2/3. As I said, the chapter
>> information might also exist somewhere else in the format, but finding it would
>> require more reverse engineering, which I am not comfortable with.
>> 
>> Yes, my current patch requires a couple dozen seek operations on the file,
>> which is not optimal. But I don't think this is too much of a performance hit, as
>> nowadays everybody has an OS with a sturdy disk block cache, or read directly
>> from RAM storage anyway. It works at least, so I can use my old aa books on
>> my Linux phone now :-)
> 
> people do also play stuff remotely and we had bug reports about things like
> http doing pretty poorly with seeks.
> 
Ok, I had not thought about that. But the aa format was always intended for
download, never for streaming, as you can see in the format design. You can't
really expect it to work great for something it isn't supposed to to.

Anyways, without seek support, you cannot really *use* aa format as an audio
book: You loose the ability of jumping to chapters, skip back 30 secs, quit and
resume later where you left off, etc. My intention with this patch series was to
make aa more usable in that sense, even if it is not a perfect technical solution.
I don't know if that is good enough to warrant inclusion in the ffmpeg codebase.

> 
> Thanks
> 
Thank *you* for the great feedback!

Cheers, Karsten
Michael Niedermayer July 6, 2018, 9:32 p.m. UTC | #11
On Fri, Jul 06, 2018 at 10:49:46AM +0200, Karsten Otto wrote:
> 
> > Am 04.07.2018 um 23:54 schrieb Michael Niedermayer <michael@niedermayer.cc>:
> > 
> > Signierter PGP-Teil
> > On Wed, Jul 04, 2018 at 09:32:32AM +0200, Karsten Otto wrote:
> >> 
> >>> Am 04.07.2018 um 03:26 schrieb Michael Niedermayer <michael@niedermayer.cc>:
> >>> 
> >>> Signierter PGP-Teil
> >>> On Tue, Jul 03, 2018 at 10:25:36PM +0200, Karsten Otto wrote:
> >>>> TL;DR: I will drop patch 3/3, may rather spend some time investigating why
> >>>> "ff ee 47 9d“ passes the mp3 header parser. Also, the aa file "index" cannot
> >>>> be used for frame or chapter detection, unfortunately.
> >>>> 
> >>>> More details inline below.
> >>>> 
> >>>>> Am 03.07.2018 um 02:32 schrieb Michael Niedermayer <michael@niedermayer.cc>:
> >>>>> 
> >>>>> Signierter PGP-Teil
> >>>>> On Mon, Jul 02, 2018 at 07:21:43PM +0200, Karsten Otto wrote:
> >>>>>> 
> >>>>>>> Am 02.07.2018 um 10:59 schrieb Michael Niedermayer <michael@niedermayer.cc>:
> >>>>>>> 
> >>>>>>> Signierter PGP-Teil
> >>>>>>> On Thu, Jun 21, 2018 at 06:58:26PM +0200, Karsten Otto wrote:
> >>>>>>>> MP3 frames may not be aligned to aa chunk boundaries. After seeking,
> >>>>>>>> scan for the next valid frame header. Then truncate the packet, and
> >>>>>>>> also adjust timestamp information accordingly.
> >>>>>>>> ---
> >>>>>>>> libavformat/aadec.c | 33 ++++++++++++++++++++++++++++-----
> >>>>>>>> 1 file changed, 28 insertions(+), 5 deletions(-)
> >>>>>>> 
> >>>>>>> Please see AVSTREAM_PARSE_TIMESTAMPS
> >>>>>>> 
> >>>>>>> This codec specific code in demuxers should not be needed
> >>>>>>> 
> >>>>>> I tried that before, and you are right that it takes care of timestamp adjustments.
> >>>>>> 
> >>>>>> However, after a seek the parsed packet still contains a partial frame before the
> >>>>>> next full one. I had expected libavformat/mpegaudio_parser.c to detect this
> >>>>>> situation and discard the fragment, but unfortunately it does not. Instead it passes
> >>>>>> it unchanged to the codec, which plays it as a pop or even a very ugly BLEEEP -
> >>>>>> painful while wearing headphones!
> >>>>> 
> >>>>> I think you mis-diagnose this at least somewhat
> >>>>> your code searches for a specific mp3 header, the parser and decoder would
> >>>>> accept a wider range of mp3 variants.
> >>>>> But both can choose points that are not mp3 frame starts. (if that is the
> >>>>> problem you are seeing, iam not completely sure it is)
> >>>>> 
> >>>> It took a closer look at what happens when I hear a BLEEP: The packet begins
> >>>> with a partial frame, starting with the byte sequence "ff ee 47 9d“. Unfortunately,
> >>>> the mp3 parser indeed accepts this as a valid mp3 header, causing the noise.
> >>>> By looking for the more restricted header, my patch finds the real next frame at
> >>>> offset 78.
> >>>> 
> >>>> BTW: Should this sequence actually pass? AFAIK 01 is not a valid MPEG audio
> >>>> version ID?
> >>>> 
> >>>>> Also is the more restricted header you search for always used or could
> >>>>> this be failing with some files ?
> >>>>> 
> >>>> Good question. So far, all mp3 aa files I tested with matched the format (MPEG 2
> >>>> Layer III at 32 kbps and 22kHz). I doubt there are other variants, but can’t be sure.
> >>>> 
> >>>>> Either way, looking at the demuxer a bit deeper, theres a TOC list in the
> >>>>> main header which points to chunks. The one file i found has 12 such chunks
> >>>>> the first represents the whole file i suspect, the next largest the audio
> >>>>> data, another one the metadata.
> >>>>> I guess the remaining 2 large ones could be a cover image and an index.
> >>>> Correct, seems like all aa files have the TOC, but its entries can be in a different
> >>>> order in each file. I guess thats why the original aadec.c implementation just
> >>>> looks for the largest chunk to play.
> >>>> 
> >>>>> I didnt really look at it, but theres a table in there with pairs of 32bit
> >>>>> values. the first in the file i have goes from 0 to 3 the second starts
> >>>>> multiple times from 0 and seems monotonly increasing and staying within
> >>>>> the filesize.
> >>>>> The sample i have does not store mp3 but it looks like this is a index
> >>>>> maybe offsets for packets in each of the 3 chapters.
> >>>>> 
> >>>>> Please look at the data, if it can be used. It would be much better than
> >>>>> scaning the file linearly and searching for some byte sequence to find
> >>>>> packet starts.
> >>>>> 
> >>>> Short answer: Sorry, it is not possible to derive frame offsets nor chapter
> >>>> offsets from the index.
> >>>> 
> >>> 
> >>>> Long answer:
> >>>> All offsets in the index are the same, and matching the "codec_second_size"
> >>>> = crypto chunk size, roughly one second of audio:
> >>>> - 1045 for format 2 (SIPR 8kbps)
> >>>> - 2000 for format 3 (SIPR 16kbps)
> >>>> - 3982 for format 4 (MP3 32kbps)
> >>>> This is different from the respective frame size, which is 19, 20, and 104/105
> >>> 
> >>> The first 2 are exact multiples of the frame size.
> >>> 
> >>> I dont have a sample for the mp3 case so i cannot check but are you sure
> >>> the actually writen numbers dont match up multiples of mp3 frames ?
> >>> 
> >> Yes, I have tested both with some old aa books I still had on disk, and some I
> >> bought more recently. Using -fdebug ts, I can clearly see a packet size of
> >> 104 alternating with 105, as I would expect from mp3. Furthermore, using my
> >> specialized header detection on every chunk, I see that the actual mp3 frame
> >> header indeed starts at offset 0 in chunk 0. But then it changes for every
> >> subsequent chunk, I have seen a maximum of 103, as to be expected.
> > 
> > Is there a public "aa" file with mp3 ?
> > Id like to look at this myself
> > also we should add such a file to fatesamples and add some tests with it
> > 
> Sadly no, at least not legally. aa files are always personalized to a user account.
> Audible offers some free of charge titles, you can get them without a subscription,
> if you sign up / use an Amazon account. To get the aa version, choose
> "normal quality"; otherwise you get the successor format aax, which is mp4.
> 
> > 
> > 
> >> 
> >>> If not the question that remains is how does the official code seek in this?
> >>> It seems a bit odd that they would get the design of the index wrong but then
> >>> implement demuxer side searching for a mp3 header not noticing that this is
> >>> not working fully reliable.
> >>> [This would imply that while they implemented the demuxer they would have
> >>> been aware that their index doesnt work, why would they not fix it at this
> >>> point ...]
> >>> Its hard to imagine they would not notice this bug with the index at all,
> >>> not that this is impossible or anything
> >>> 
> >> 
> >> Agreed. Here is how I think this is actually supposed to work: Every encrypted
> >> chunk contains roughly one second of audio. So, if you want to seek to a
> >> specific second, you can just look up an index entry. This tells you the chapter
> >> number and the offset in the chapter. Now, if you already know the chapter
> >> offsets, you can jump right to the correct position in the audio section. (I assume
> >> the chapter offsets are somewhere else in the aa format, possibly encrypted.)
> >> Regarding the odd mp3 frame offsets, a more restricted header detection will
> >> take care of that, as I do in patch 3/3.
> > 
> > The more restricted header detection will fail less often. But it would be
> > unexpected that it never fails. Theres not really anything that prevents
> > these specific sequence of bits to occur in places other than the header
> > 
> > All this makes it a bit hard to belive for me that this is how the format is
> > intended to work.
> > 
> As I said, I think mp3 in aa is more a retrofit than intentional design. It works if
> they always use the exact same encoding (as they do from my observations).
> Then they can used an *exact* header match to sync, and the chance of the
> same 32 bit pattern occurring unintentionally in audio content is highly unlikely.

the code looks like it checks 21 bits not 32.
in random data this would be expected to occur once in 2mb. That does not seem
highly unlikely to me

also if you are sure it can never be vbr and its always the hardcoded format
in the check then you can do even better probably.
As you can calculate where each packet starts and allowing for +-1 rounding
in the packet position it should be possible to find this without checking more
then a few positions around the exact location based on CBR

[...]
Karsten Otto July 7, 2018, 9:46 a.m. UTC | #12
> Am 06.07.2018 um 23:32 schrieb Michael Niedermayer <michael@niedermayer.cc>:
> 
> Signierter PGP-Teil
> On Fri, Jul 06, 2018 at 10:49:46AM +0200, Karsten Otto wrote:
>> 
>>> Am 04.07.2018 um 23:54 schrieb Michael Niedermayer <michael@niedermayer.cc>:
>>> 
>>> Signierter PGP-Teil
>>> On Wed, Jul 04, 2018 at 09:32:32AM +0200, Karsten Otto wrote:
>>>> 
>>>>> Am 04.07.2018 um 03:26 schrieb Michael Niedermayer <michael@niedermayer.cc>:
>>>>> 
>>>>> Signierter PGP-Teil
>>>>> On Tue, Jul 03, 2018 at 10:25:36PM +0200, Karsten Otto wrote:
>>>>>> TL;DR: I will drop patch 3/3, may rather spend some time investigating why
>>>>>> "ff ee 47 9d“ passes the mp3 header parser. Also, the aa file "index" cannot
>>>>>> be used for frame or chapter detection, unfortunately.
>>>>>> 
>>>>>> More details inline below.
>>>>>> 
>>>>>>> Am 03.07.2018 um 02:32 schrieb Michael Niedermayer <michael@niedermayer.cc>:
>>>>>>> 
>>>>>>> Signierter PGP-Teil
>>>>>>> On Mon, Jul 02, 2018 at 07:21:43PM +0200, Karsten Otto wrote:
>>>>>>>> 
>>>>>>>>> Am 02.07.2018 um 10:59 schrieb Michael Niedermayer <michael@niedermayer.cc>:
>>>>>>>>> 
>>>>>>>>> Signierter PGP-Teil
>>>>>>>>> On Thu, Jun 21, 2018 at 06:58:26PM +0200, Karsten Otto wrote:
>>>>>>>>>> MP3 frames may not be aligned to aa chunk boundaries. After seeking,
>>>>>>>>>> scan for the next valid frame header. Then truncate the packet, and
>>>>>>>>>> also adjust timestamp information accordingly.
>>>>>>>>>> ---
>>>>>>>>>> libavformat/aadec.c | 33 ++++++++++++++++++++++++++++-----
>>>>>>>>>> 1 file changed, 28 insertions(+), 5 deletions(-)
>>>>>>>>> 
>>>>>>>>> Please see AVSTREAM_PARSE_TIMESTAMPS
>>>>>>>>> 
>>>>>>>>> This codec specific code in demuxers should not be needed
>>>>>>>>> 
>>>>>>>> I tried that before, and you are right that it takes care of timestamp adjustments.
>>>>>>>> 
>>>>>>>> However, after a seek the parsed packet still contains a partial frame before the
>>>>>>>> next full one. I had expected libavformat/mpegaudio_parser.c to detect this
>>>>>>>> situation and discard the fragment, but unfortunately it does not. Instead it passes
>>>>>>>> it unchanged to the codec, which plays it as a pop or even a very ugly BLEEEP -
>>>>>>>> painful while wearing headphones!
>>>>>>> 
>>>>>>> I think you mis-diagnose this at least somewhat
>>>>>>> your code searches for a specific mp3 header, the parser and decoder would
>>>>>>> accept a wider range of mp3 variants.
>>>>>>> But both can choose points that are not mp3 frame starts. (if that is the
>>>>>>> problem you are seeing, iam not completely sure it is)
>>>>>>> 
>>>>>> It took a closer look at what happens when I hear a BLEEP: The packet begins
>>>>>> with a partial frame, starting with the byte sequence "ff ee 47 9d“. Unfortunately,
>>>>>> the mp3 parser indeed accepts this as a valid mp3 header, causing the noise.
>>>>>> By looking for the more restricted header, my patch finds the real next frame at
>>>>>> offset 78.
>>>>>> 
>>>>>> BTW: Should this sequence actually pass? AFAIK 01 is not a valid MPEG audio
>>>>>> version ID?
>>>>>> 
>>>>>>> Also is the more restricted header you search for always used or could
>>>>>>> this be failing with some files ?
>>>>>>> 
>>>>>> Good question. So far, all mp3 aa files I tested with matched the format (MPEG 2
>>>>>> Layer III at 32 kbps and 22kHz). I doubt there are other variants, but can’t be sure.
>>>>>> 
>>>>>>> Either way, looking at the demuxer a bit deeper, theres a TOC list in the
>>>>>>> main header which points to chunks. The one file i found has 12 such chunks
>>>>>>> the first represents the whole file i suspect, the next largest the audio
>>>>>>> data, another one the metadata.
>>>>>>> I guess the remaining 2 large ones could be a cover image and an index.
>>>>>> Correct, seems like all aa files have the TOC, but its entries can be in a different
>>>>>> order in each file. I guess thats why the original aadec.c implementation just
>>>>>> looks for the largest chunk to play.
>>>>>> 
>>>>>>> I didnt really look at it, but theres a table in there with pairs of 32bit
>>>>>>> values. the first in the file i have goes from 0 to 3 the second starts
>>>>>>> multiple times from 0 and seems monotonly increasing and staying within
>>>>>>> the filesize.
>>>>>>> The sample i have does not store mp3 but it looks like this is a index
>>>>>>> maybe offsets for packets in each of the 3 chapters.
>>>>>>> 
>>>>>>> Please look at the data, if it can be used. It would be much better than
>>>>>>> scaning the file linearly and searching for some byte sequence to find
>>>>>>> packet starts.
>>>>>>> 
>>>>>> Short answer: Sorry, it is not possible to derive frame offsets nor chapter
>>>>>> offsets from the index.
>>>>>> 
>>>>> 
>>>>>> Long answer:
>>>>>> All offsets in the index are the same, and matching the "codec_second_size"
>>>>>> = crypto chunk size, roughly one second of audio:
>>>>>> - 1045 for format 2 (SIPR 8kbps)
>>>>>> - 2000 for format 3 (SIPR 16kbps)
>>>>>> - 3982 for format 4 (MP3 32kbps)
>>>>>> This is different from the respective frame size, which is 19, 20, and 104/105
>>>>> 
>>>>> The first 2 are exact multiples of the frame size.
>>>>> 
>>>>> I dont have a sample for the mp3 case so i cannot check but are you sure
>>>>> the actually writen numbers dont match up multiples of mp3 frames ?
>>>>> 
>>>> Yes, I have tested both with some old aa books I still had on disk, and some I
>>>> bought more recently. Using -fdebug ts, I can clearly see a packet size of
>>>> 104 alternating with 105, as I would expect from mp3. Furthermore, using my
>>>> specialized header detection on every chunk, I see that the actual mp3 frame
>>>> header indeed starts at offset 0 in chunk 0. But then it changes for every
>>>> subsequent chunk, I have seen a maximum of 103, as to be expected.
>>> 
>>> Is there a public "aa" file with mp3 ?
>>> Id like to look at this myself
>>> also we should add such a file to fatesamples and add some tests with it
>>> 
>> Sadly no, at least not legally. aa files are always personalized to a user account.
>> Audible offers some free of charge titles, you can get them without a subscription,
>> if you sign up / use an Amazon account. To get the aa version, choose
>> "normal quality"; otherwise you get the successor format aax, which is mp4.
>> 
>>> 
>>> 
>>>> 
>>>>> If not the question that remains is how does the official code seek in this?
>>>>> It seems a bit odd that they would get the design of the index wrong but then
>>>>> implement demuxer side searching for a mp3 header not noticing that this is
>>>>> not working fully reliable.
>>>>> [This would imply that while they implemented the demuxer they would have
>>>>> been aware that their index doesnt work, why would they not fix it at this
>>>>> point ...]
>>>>> Its hard to imagine they would not notice this bug with the index at all,
>>>>> not that this is impossible or anything
>>>>> 
>>>> 
>>>> Agreed. Here is how I think this is actually supposed to work: Every encrypted
>>>> chunk contains roughly one second of audio. So, if you want to seek to a
>>>> specific second, you can just look up an index entry. This tells you the chapter
>>>> number and the offset in the chapter. Now, if you already know the chapter
>>>> offsets, you can jump right to the correct position in the audio section. (I assume
>>>> the chapter offsets are somewhere else in the aa format, possibly encrypted.)
>>>> Regarding the odd mp3 frame offsets, a more restricted header detection will
>>>> take care of that, as I do in patch 3/3.
>>> 
>>> The more restricted header detection will fail less often. But it would be
>>> unexpected that it never fails. Theres not really anything that prevents
>>> these specific sequence of bits to occur in places other than the header
>>> 
>>> All this makes it a bit hard to belive for me that this is how the format is
>>> intended to work.
>>> 
>> As I said, I think mp3 in aa is more a retrofit than intentional design. It works if
>> they always use the exact same encoding (as they do from my observations).
>> Then they can used an *exact* header match to sync, and the chance of the
>> same 32 bit pattern occurring unintentionally in audio content is highly unlikely.
> 
> the code looks like it checks 21 bits not 32.
> in random data this would be expected to occur once in 2mb. That does not seem
> highly unlikely to me
> 
Sure, I did not implement a strict check at that time because I was not sure what
kinds of encoding I should expect. After further testing, I think it could be fixed to
everything but the padding bit, i.e. 31 bit check. Anyway, I will not be doing that
anymore, but leave it to the regular mpeg header parser.

(Actually I think there is a way to improve libavcodec/mpegaudiodecheader.c
by explicitly checking for the reserved pattern of the MPEG version. It will avoid
at least some of the false positives I observed. But that will be a separate patch.)

> also if you are sure it can never be vbr and its always the hardcoded format
> in the check then you can do even better probably.
> As you can calculate where each packet starts and allowing for +-1 rounding
> in the packet position it should be possible to find this without checking more
> then a few positions around the exact location based on CBR
> 
Great idea! By calculating the frame start offset, I don't need to scan for headers
anymore, but can predict where it is likely to be. The mp3 parser can do the rest.
Much cleaner solution from an architecture point of view.

I did some testing, most books don't use frame padding at all, so this frame offset
prediction is spot on! I did find one book that has padding, but it also has some
extra data at the beginning of of its main chapter (ID3 tag?). This throws off the
calculation by at most one frame, i.e. 104 bytes, which is no worse than without
offset prediction. I assume it is a rare case anyway, so I am inclined to go with
this solution. Will prepare a new patch set soon.

> [...]
> -- 
> Michael     GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB
> 
> No snowflake in an avalanche ever feels responsible. -- Voltaire
> 
>
diff mbox

Patch

diff --git a/libavformat/aadec.c b/libavformat/aadec.c
index b009c9deca..3bb8cd0768 100644
--- a/libavformat/aadec.c
+++ b/libavformat/aadec.c
@@ -50,6 +50,7 @@  typedef struct AADemuxContext {
     int64_t current_chapter_size;
     int64_t content_start;
     int64_t content_end;
+    int did_seek;
 } AADemuxContext;
 
 static int get_second_size(char *codec_name)
@@ -205,6 +206,7 @@  static int aa_read_header(AVFormatContext *s)
     }
     start = TOC[largest_idx].offset;
     avio_seek(pb, start, SEEK_SET);
+    c->did_seek = 0;
 
     // extract chapter positions. since all formats have constant bit rate, use it
     // as time base in bytes/s, for easy stream position <-> timestamp conversion
@@ -242,7 +244,7 @@  static int aa_read_packet(AVFormatContext *s, AVPacket *pkt)
     int trailing_bytes;
     int blocks;
     uint8_t buf[MAX_CODEC_SECOND_SIZE * 2];
-    int written = 0;
+    int written = 0, offset = 0;
     int ret;
     AADemuxContext *c = s->priv_data;
     uint64_t pos = avio_tell(s->pb);
@@ -295,10 +297,32 @@  static int aa_read_packet(AVFormatContext *s, AVPacket *pkt)
     if (c->current_chapter_size <= 0)
         c->current_chapter_size = 0;
 
-    ret = av_new_packet(pkt, written);
+    // re-init timestamps after seeking
+    if (c->did_seek) {
+        c->did_seek = 0;
+
+        if (s->streams[0]->codecpar->codec_id == AV_CODEC_ID_MP3) {
+            for (; offset < written - 2; ++offset) {
+                // find mp3 header: sync, v2, layer3, 32kbps, non-reserved sample rate
+                if ((buf[offset + 0] & 0xff) == 0xff &&
+                    (buf[offset + 1] & 0xfe) == 0xf2 &&
+                    (buf[offset + 2] & 0xf0) == 0x40 &&
+                    (buf[offset + 2] & 0x0c) != 0x0c)
+                        break;
+            }
+            if (offset == written - 2)
+                offset = 0; // not found, e.g. chapter end chunk; just use as is
+        }
+        ff_update_cur_dts(s, s->streams[0],
+            (pos + offset - c->content_start - CHAPTER_HEADER_SIZE * (c->chapter_idx - 1))
+            * TIMEPREC);
+    }
+
+    // create packet
+    ret = av_new_packet(pkt, written - offset);
     if (ret < 0)
         return ret;
-    memcpy(pkt->data, buf, written);
+    memcpy(pkt->data, buf + offset, written - offset);
     pkt->pos = pos;
 
     return 0;
@@ -343,8 +367,7 @@  static int aa_read_seek(AVFormatContext *s,
     c->current_codec_second_size = c->codec_second_size;
     c->current_chapter_size = chapter_size - chapter_pos;
     c->chapter_idx = 1 + chapter_idx;
-
-    ff_update_cur_dts(s, s->streams[0], ch->start + chapter_pos * TIMEPREC);
+    c->did_seek = 1;
 
     return 1;
 }