diff mbox series

[FFmpeg-devel] lavf/matroskadec: stop mapping text/plain attachments to AV_CODEC_ID_TEXT

Message ID 20220607115839.18515-1-anton@khirnov.net
State Accepted
Commit 38df63f967870cad9d39a02754360db0066e7f80
Headers show
Series [FFmpeg-devel] lavf/matroskadec: stop mapping text/plain attachments to AV_CODEC_ID_TEXT | expand

Checks

Context Check Description
yinshiyou/make_loongarch64 success Make finished
yinshiyou/make_fate_loongarch64 success Make fate finished
andriy/make_armv7_RPi4 success Make finished
andriy/make_fate_armv7_RPi4 success Make fate finished
andriy/make_x86 success Make finished
andriy/make_fate_x86 success Make fate finished

Commit Message

Anton Khirnov June 7, 2022, 11:58 a.m. UTC
There is no reason to think that an attachment will contain text
subtitles. In addition attachments are exported in extradata, so the
AV_CODEC_ID_TEXT decoder would not do anything useful with it anyway.
---
 libavformat/matroskadec.c | 1 -
 1 file changed, 1 deletion(-)

Comments

Anton Khirnov June 8, 2022, 6:17 a.m. UTC | #1
Quoting Soft Works (2022-06-08 05:45:38)
> You might allow me the question whether we can be sure that
> this is the only case which is subject to the regression?
> 
> Besides from what I reported above (and you might probably come up
> with a new codec ID for discriminating text subs vs. text
> attachments), this surely fixes the specific case I reported,
> but I wonder whether other cases could exist?
> (it's meant to be a normal question - I just don't know)

We can never be truly sure of anything except our own existence.

As far as I can tell, only two bits of code in lavf can export
AVMEDIA_TYPE_ATTACHMENT: apetag and matroskadec. apetag does not set
codec id at all, while matroskadec will now only export codec ids that
do not (and most likely will not) have decoders: fonts and generic
binary data.

So after this patch, to the best of my knowledge, there should never be
a case where an AVMEDIA_TYPE_ATTACHMENT stream has a decodable codec id.
Then again this does not exclude all possible cases of a mismatch
between a stream's codec type and id.

Overall I'd say this just strengthens the case for my original lavc
commit, since it is clearly helpful in exposing bugs in other code.

> Here's another thought that might be worth consideration:
> What turned this from a minor into a major issue (from my pov),
> is that it is causing ffprobe to fail and exit with error
> and incomplete output.
> What I'm wondering about in this context, is whether it
> even has to be like that?
> 
> I mean, an unknown codec doesn't cause ffprobe to error-exit,
> does it need to do so when avcodec_open2() returns error?
> 
> I would find that behavior ok and consistent when the same
> would happen when running ffmpeg (ffprobe fails <=> ffmpeg fails).
> But ffmpeg doesn't fail (unless you use the stream), so does
> ffprobe even need to fail in these cases?

I suppose it can make sense to log an error and continue when opening
the codec fails. This could be useful also for probing genuinely broken
streams where e.g. extradata parsing fails.

There could also be an option like ffmpeg's -xerror that would make
ffprobe exit on failure.
Anton Khirnov June 8, 2022, 7:09 a.m. UTC | #2
Quoting Soft Works (2022-06-08 08:39:21)
> > Overall I'd say this just strengthens the case for my original lavc
> > commit, since it is clearly helpful in exposing bugs in other code.
> 
> As said already, I never doubted the validity of your patch, it was
> about the effect and unacknowledged responsibilities.
> 
> What do you want to do with the text attachment "none" caption?
> Maybe a separate "cummy" codec id?

Do we need to do anything about it? I am not a fan of inventing fake
codec ids for every conceivable kind of data. lavf already exports the
MIME type, that should be enough. Maybe the way it is printed can be
improved, but that is not urgent as far as I'm concerned.

> > I suppose it can make sense to log an error and continue when opening
> > the codec fails. This could be useful also for probing genuinely broken
> > streams where e.g. extradata parsing fails.
> > 
> > There could also be an option like ffmpeg's -xerror that would make
> > ffprobe exit on failure.
> 
> Sounds good to me, but I'm not sure whether everybody would be ok
> doing it exactly like this, as somebody might argue they would rely
> on ffprobe failing in such cases. 
> I can submit a patch for that - unless no objections or better ideas
> would appear..

https://xkcd.com/1172/
Anton Khirnov June 8, 2022, 8:04 a.m. UTC | #3
Quoting Soft Works (2022-06-08 09:43:19)
> > Do we need to do anything about it? I am not a fan of inventing fake
> > codec ids for every conceivable kind of data. lavf already exports the
> > MIME type, that should be enough. Maybe the way it is printed can be
> > improved, but that is not urgent as far as I'm concerned.
> 
> ffprobe provides multiple output formats which are intended to be machine-
> readable (e.g. xml, json).
> If it was just about interactive reading by a user, then it might 
> in fact not matter that much. But for a machine, it was "text" and the patch
> changes it to "none". 

When we add a new codec ID, what was previously "none" changes to an
actual codec name. Users for whom any change at all is a problem should
simply not update their binaries.

> 
> > I am not a fan of inventing fake
> > codec ids for every conceivable kind of data.
> 
> This surely makes sense from a larger perspective, but when such fake ids are 
> already being used, then it doesn't matter how many. 5 or 6 - 33 or 34 - that 
> doesn't have relevant impact. At one point in the future, we might get rid of
> them for something better, but - again - the count won't be of much relevance. 

I disagree. The more of them we add, the harder it becomes to remove
them. My points are
- the relevant information is already available in mimetype, adding a
  fake codec id does not give you anything new
- there is an effectively unlimited number of such codec ids, it would
  be absurd to add them all
- codec ids are meant to identify different kinds of data that can be
  handled by libavcodec; the data in question here is outside the scope
  of libavcodec, therefore it should not have a codec id

> > > > I suppose it can make sense to log an error and continue when opening
> > > > the codec fails. This could be useful also for probing genuinely broken
> > > > streams where e.g. extradata parsing fails.
> > > >
> > > > There could also be an option like ffmpeg's -xerror that would make
> > > > ffprobe exit on failure.
> > >
> > > Sounds good to me, but I'm not sure whether everybody would be ok
> > > doing it exactly like this, as somebody might argue they would rely
> > > on ffprobe failing in such cases.
> > > I can submit a patch for that - unless no objections or better ideas
> > > would appear..
> > 
> > https://xkcd.com/1172/
> 
> I don't think it's absurd, neither hypothetical to ask whether somebody,
> would object a certain change --- BEFORE working spending time on it.

It does makes sense to consider it, but at some point you have to accept
that ANY change you make can possibly break somebody's use case. So
being overly considerate means no useful development gets done.
Anton Khirnov June 8, 2022, 12:16 p.m. UTC | #4
Quoting Soft Works (2022-06-08 10:34:05)
> > > I don't think it's absurd, neither hypothetical to ask whether somebody,
> > > would object a certain change --- BEFORE working spending time on it.
> > 
> > It does makes sense to consider it, but at some point you have to accept
> > that ANY change you make can possibly break somebody's use case. So
> > being overly considerate means no useful development gets done.
> 
> I very much agree to this. But I'm wondering how this can go together 
> with versioning the code by 24 different numbers (8 libs * 3 integers).
> Even when you make a change that is undoubtedly correct and justified and 
> right - when it changes behavior it breaks compatibility, even when the
> previous behavior was totally bad and wrong.
> 
> Once you leave that path and replace it by some "custom" judgement with regards
> to what's compatible and what's breaking and consider a change to be non-breaking 
> because it changes only "illegitimate" behavior, it's still a breaking and 
> incompatible change. 
> When it's not possible to rely on a precise and strict compatibility based 
> on these version numbers with major, minor and micro for every single lib,
> and you say - nevermind, every change will break something, what are all
> those numbers for? They would be pointless, wouldn't they?

Those numbers are not "code version", they are "API version". They do
not provide guarantees on specific code behavior, they provide
guarantees on what APIs are available and their semantics.

> > When we add a new codec ID, what was previously "none" changes to an
> > actual codec name. Users for whom any change at all is a problem should
> > simply not update their binaries.
> 
> How should those users know when the library version numbering doesn't 
> indicate that breaking change?

Tests. Lots of tests.
If any change in behaviour is a breaking change for you, then git commit
hash is the library version you should use.
Anton Khirnov June 8, 2022, 5:38 p.m. UTC | #5
Quoting Soft Works (2022-06-08 17:38:31)
> > Tests. Lots of tests.
> > If any change in behaviour is a breaking change for you, then git commit
> > hash is the library version you should use.
> 
> Not any change, but for a change in output, I'd say yes. I mean - isn't that
> what FATE is checking? And don't you say "breaks FATE" when the an output
> changes?

We generally do not make hard guarantees about the output produced by
the libraries, otherwise almost everything would be a compatibility
break (e.g. adding support for previously unimplemented codec or
container features).
We only promise to make best effort to produce "sensible" output, where
"sensible" has no strict definition and is subject to discussion and
consensus among developers. In this case, I argued that previous
behavior was a bug and nobody disagreed so far.

"breaks FATE" is a shortcut, it means only that the output has changed.
That may be because a change broke something, but it also happens that
there is a legitimate reason for the output changing, then the test
references are simply updated.

> 
> After all, I'm still having a hard time in assimilating the logic and mindsets 
> here. I'm obviously coming from a very different background and certain things 
> just don't make sense to me, so I have to ask many questions, when I can't 
> believe it, I ask once again, and again and re-assure that it's really that 
> way that I can hardly believe. When the versions do not guarantee anything
> besides API compatibility, then there's almost not practical value in having
> them, because things can always go wrong no matter whether those versions
> match or not as you always need to test thoroughly. You would be able
> to rule out some combinations up-front, but that's it.
> Also, given that fact that matching versions do not guarantee anything of
> practical value, who would then still want to mix ffmpeg libs from different 
> Git revisions based on those version numbers? I can't compute this ;-)

API compatibility is of great practical value to many people. To the
contrary, I'd say that absolute output stability is far less important
to most our users.
diff mbox series

Patch

diff --git a/libavformat/matroskadec.c b/libavformat/matroskadec.c
index de73f97aca..cd30b5f7a4 100644
--- a/libavformat/matroskadec.c
+++ b/libavformat/matroskadec.c
@@ -806,7 +806,6 @@  static const CodecMime mkv_image_mime_tags[] = {
 };
 
 static const CodecMime mkv_mime_tags[] = {
-    {"text/plain"                 , AV_CODEC_ID_TEXT},
     {"application/x-truetype-font", AV_CODEC_ID_TTF},
     {"application/x-font"         , AV_CODEC_ID_TTF},
     {"application/vnd.ms-opentype", AV_CODEC_ID_OTF},