diff mbox series

[FFmpeg-devel] avcodec/bsf: Avoid allocation for AVBSFInternal

Message ID 20200810135522.5972-1-andreas.rheinhardt@gmail.com
State New
Headers show
Series [FFmpeg-devel] avcodec/bsf: Avoid allocation for AVBSFInternal
Related show

Checks

Context Check Description
andriy/default pending
andriy/make success Make finished
andriy/make_fate success Make fate finished

Commit Message

Andreas Rheinhardt Aug. 10, 2020, 1:55 p.m. UTC
by allocating it together with the AVBSFContext.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@gmail.com>
---
Similar things can of course be done with other structures. I did only
this one to see whether everyone is ok with this.

 libavcodec/bsf.c | 25 +++++++++++++------------
 1 file changed, 13 insertions(+), 12 deletions(-)

Comments

James Almer Aug. 10, 2020, 2:20 p.m. UTC | #1
On 8/10/2020 10:55 AM, Andreas Rheinhardt wrote:
> by allocating it together with the AVBSFContext.
> 
> Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@gmail.com>
> ---
> Similar things can of course be done with other structures. I did only
> this one to see whether everyone is ok with this.

Personally, i don't like it. It's extra complexity to save a single 8 or
12 byte allocation that happens once during bsf alloc. It's kind of a
pointless micro-optimization.

> 
>  libavcodec/bsf.c | 25 +++++++++++++------------
>  1 file changed, 13 insertions(+), 12 deletions(-)
> 
> diff --git a/libavcodec/bsf.c b/libavcodec/bsf.c
> index d71bc32584..9781be78e7 100644
> --- a/libavcodec/bsf.c
> +++ b/libavcodec/bsf.c
> @@ -37,6 +37,11 @@ struct AVBSFInternal {
>      int eof;
>  };
>  
> +typedef struct AVBSFCombined {
> +    AVBSFContext bsf;
> +    AVBSFInternal internal;
> +} AVBSFCombined;
> +
>  void av_bsf_free(AVBSFContext **pctx)
>  {
>      AVBSFContext *ctx;
> @@ -50,9 +55,8 @@ void av_bsf_free(AVBSFContext **pctx)
>      if (ctx->filter->priv_class && ctx->priv_data)
>          av_opt_free(ctx->priv_data);
>  
> -    if (ctx->internal)
> -        av_packet_free(&ctx->internal->buffer_pkt);
> -    av_freep(&ctx->internal);
> +    av_assert2(ctx->internal);
> +    av_packet_free(&ctx->internal->buffer_pkt);
>      av_freep(&ctx->priv_data);
>  
>      avcodec_parameters_free(&ctx->par_in);
> @@ -93,14 +97,18 @@ const AVClass *av_bsf_get_class(void)
>  
>  int av_bsf_alloc(const AVBitStreamFilter *filter, AVBSFContext **pctx)
>  {
> +    AVBSFCombined *com;
>      AVBSFContext *ctx;
>      AVBSFInternal *bsfi;
>      int ret;
>  
> -    ctx = av_mallocz(sizeof(*ctx));
> -    if (!ctx)
> +    com = av_mallocz(sizeof(*com));
> +    if (!com)
>          return AVERROR(ENOMEM);
>  
> +    ctx  = &com->bsf;
> +    bsfi = ctx->internal = &com->internal;
> +
>      ctx->av_class = &bsf_class;
>      ctx->filter   = filter;
>  
> @@ -111,13 +119,6 @@ int av_bsf_alloc(const AVBitStreamFilter *filter, AVBSFContext **pctx)
>          goto fail;
>      }
>  
> -    bsfi = av_mallocz(sizeof(*bsfi));
> -    if (!bsfi) {
> -        ret = AVERROR(ENOMEM);
> -        goto fail;
> -    }
> -    ctx->internal = bsfi;
> -
>      bsfi->buffer_pkt = av_packet_alloc();
>      if (!bsfi->buffer_pkt) {
>          ret = AVERROR(ENOMEM);
>
Zhao Zhili Aug. 10, 2020, 3:08 p.m. UTC | #2
> On Aug 11, 2020, at 8:14 AM, Andreas Rheinhardt <andreas.rheinhardt@gmail.com> wrote:
> 
> James Almer:
>> On 8/10/2020 8:12 PM, Andreas Rheinhardt wrote:
>>> James Almer:
>>>> On 8/10/2020 7:11 PM, Nicolas George wrote:
>>>>> James Almer (12020-08-10):
>>>>>> Personally, i don't like it. It's extra complexity to save a single 8 or
>>>>>> 12 byte allocation that happens once during bsf alloc. It's kind of a
>>>>>> pointless micro-optimization.
>>>>> 
>>>>> I do not agree at all.
>>>>> 
>>>>> First, it is not extra complexity, it actually makes the code simpler:
>>>>> less mutually dependant allocations that can lead to leaks if they are
>>>>> not handled properly, better guarantees, for no more code.
>>>> 
>>>> It adds an extra struct and makes the code harder to read. Might as well
>>>> just do
>>>> 
>>>> ctx = av_mallocz(sizeof(*ctx) + sizeof(AVBSFInternal));
>>>> ctx->internal = &ctx[1];
>>> 
>>> This is not ok due to padding/alignment.
>> 
>> libdav1d would have broke by now if that was the case. Do you know a
>> system where this could fail?
>> 
> 
> Imagine the context to only contain elements that require a alignment of
> 4 and the internal structure to require an alignment of eight. Then it
> is perfectly possible for &ctx[1] to only have an alignment of four and
> not of eight as internal requires it.
> 
> Could you elaborate the point about libdav1d?
> 
>>> 
>>>> 
>>>> if removing one tiny allocation in an incredibly cold function is so
>>>> important. Less code, same result.
>>> 
>>> I don't see an obfuscation/complication (and also no problem with
>>> maintainability); I see a very simple method to save an allocation. And
>>> I actually think that it simplifies the code, because now one doesn't
>>> have to worry at all any more whether internal has been properly
>>> allocated or not in av_bsf_free().
>> 
>> I don't deny it simplifies the code in regards to freeing the context in
>> failure paths, but adding a third struct does not make things clearer at
>> all. So Mark's suggestion to add AVBSFContext into AVBSFInternal may be
>> a better approach.
>> 
> There are two reasons why I like my approach more than Mark's: It allows
> to hide the details of the context<->internal relationship to the place
> where allocations happen (and if one asserted as Nicolas suggested
> (av_assert2(ctx->internal == &((AVBSFCombined *)ctx)->internal), it
> would also need to be visible for the function doing the freeing, but
> even then it can still be confined to one file). After all, the internal
> of several important other structures are in private headers to be used
> by lots of code.
> The second point is another micro-optimization: If one puts the
> AVBSFContext at the beginning of AVBSFInternal, then the offsets of all
> the fields currently existing in AVBSFInternal will increase and the
> encoding of the "read address from pointer + offset" is typically
> shorter if offset is smaller. This is not a problem here as the
> structures here are small, but other structures are bigger. This could
> of course be overcome by putting the AVBSFContext at the end of
> AVBSFInternal; but somehow this feels unnatural, although I could get
> used to it.

Add another structure is overkill in my opinion, unless we can make
it general enough (more macros?).

If we treat context<->internal as class base <-> class sub_class,
then it looks natural to put context as field of internal.

Is zero-size array allowed here? I’m not sure it worth the complexity.

struct AVBSFInternal {
    AVPacket *buffer_pkt;
    int eof;
    AVBSFContext ctx[0];
};

> 
>>> Do you remember 31aafdac2404c5e01b21e53255db3fb5ed53c7c9
>>> (https://ffmpeg.org/pipermail/ffmpeg-devel/2019-October/251816.html)
>>> where you fixed the bug that avformat_free_context() gets called upon
>>> failure to allocate the AVFormatInternal despite avformat_free_context()
>>> requiring the internal to be successfully allocated? That issue would
>>> have never even existed if one allocated the context and its internal in
>>> one piece.
>>> 
>>> - Andreas
> _______________________________________________
> ffmpeg-devel mailing list
> ffmpeg-devel@ffmpeg.org
> https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
> 
> To unsubscribe, visit link above, or email
> ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".
Nicolas George Aug. 10, 2020, 10:11 p.m. UTC | #3
James Almer (12020-08-10):
> Personally, i don't like it. It's extra complexity to save a single 8 or
> 12 byte allocation that happens once during bsf alloc. It's kind of a
> pointless micro-optimization.

I do not agree at all.

First, it is not extra complexity, it actually makes the code simpler:
less mutually dependant allocations that can lead to leaks if they are
not handled properly, better guarantees, for no more code.

Second, this is precisely the kind of "micro" optimization we should be
doing everywhere, because "micro" times "a lot" starts to be
significant, and FFmpeg has not reached its status by wasting "micro"
optimizations.

Regards,
Nicolas George Aug. 10, 2020, 10:13 p.m. UTC | #4
Andreas Rheinhardt (12020-08-10):
> by allocating it together with the AVBSFContext.
> 
> Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@gmail.com>
> ---
> Similar things can of course be done with other structures. I did only
> this one to see whether everyone is ok with this.

I wholeheartedly approve.


>  libavcodec/bsf.c | 25 +++++++++++++------------
>  1 file changed, 13 insertions(+), 12 deletions(-)
> 
> diff --git a/libavcodec/bsf.c b/libavcodec/bsf.c
> index d71bc32584..9781be78e7 100644
> --- a/libavcodec/bsf.c
> +++ b/libavcodec/bsf.c
> @@ -37,6 +37,11 @@ struct AVBSFInternal {
>      int eof;
>  };
>  
> +typedef struct AVBSFCombined {
> +    AVBSFContext bsf;
> +    AVBSFInternal internal;
> +} AVBSFCombined;
> +
>  void av_bsf_free(AVBSFContext **pctx)
>  {
>      AVBSFContext *ctx;
> @@ -50,9 +55,8 @@ void av_bsf_free(AVBSFContext **pctx)
>      if (ctx->filter->priv_class && ctx->priv_data)
>          av_opt_free(ctx->priv_data);
>  
> -    if (ctx->internal)
> -        av_packet_free(&ctx->internal->buffer_pkt);
> -    av_freep(&ctx->internal);

> +    av_assert2(ctx->internal);

av_assert2(ctx->internal == &((AVBSFCombined *)ctx)->internal);

would be even safer.

> +    av_packet_free(&ctx->internal->buffer_pkt);
>      av_freep(&ctx->priv_data);
>  
>      avcodec_parameters_free(&ctx->par_in);
> @@ -93,14 +97,18 @@ const AVClass *av_bsf_get_class(void)
>  
>  int av_bsf_alloc(const AVBitStreamFilter *filter, AVBSFContext **pctx)
>  {
> +    AVBSFCombined *com;
>      AVBSFContext *ctx;
>      AVBSFInternal *bsfi;
>      int ret;
>  
> -    ctx = av_mallocz(sizeof(*ctx));
> -    if (!ctx)
> +    com = av_mallocz(sizeof(*com));
> +    if (!com)
>          return AVERROR(ENOMEM);
>  
> +    ctx  = &com->bsf;
> +    bsfi = ctx->internal = &com->internal;
> +
>      ctx->av_class = &bsf_class;
>      ctx->filter   = filter;
>  
> @@ -111,13 +119,6 @@ int av_bsf_alloc(const AVBitStreamFilter *filter, AVBSFContext **pctx)
>          goto fail;
>      }
>  
> -    bsfi = av_mallocz(sizeof(*bsfi));
> -    if (!bsfi) {
> -        ret = AVERROR(ENOMEM);
> -        goto fail;
> -    }
> -    ctx->internal = bsfi;
> -
>      bsfi->buffer_pkt = av_packet_alloc();
>      if (!bsfi->buffer_pkt) {
>          ret = AVERROR(ENOMEM);

Regards,
James Almer Aug. 10, 2020, 10:27 p.m. UTC | #5
On 8/10/2020 7:11 PM, Nicolas George wrote:
> James Almer (12020-08-10):
>> Personally, i don't like it. It's extra complexity to save a single 8 or
>> 12 byte allocation that happens once during bsf alloc. It's kind of a
>> pointless micro-optimization.
> 
> I do not agree at all.
> 
> First, it is not extra complexity, it actually makes the code simpler:
> less mutually dependant allocations that can lead to leaks if they are
> not handled properly, better guarantees, for no more code.

It adds an extra struct and makes the code harder to read. Might as well
just do

ctx = av_mallocz(sizeof(*ctx) + sizeof(AVBSFInternal));
ctx->internal = &ctx[1];

if removing one tiny allocation in an incredibly cold function is so
important. Less code, same result.
And i *really* want to emphasize just how cold this function is,
especially ever since av_bsf_flush() was introduced, and even more so if
av_bsf_close() is also committed.

> 
> Second, this is precisely the kind of "micro" optimization we should be
> doing everywhere, because "micro" times "a lot" starts to be
> significant, and FFmpeg has not reached its status by wasting "micro"
> optimizations.

Skipping a 12 byte allocation does nothing on any system with a C
library worth its salt, where it will be cached and ready to be returned
by malloc.

Can we not get overtly clever with code obfuscation? Lets optimize
things where things need to be optimized, while also ensuring code is
readable and maintainable not just for us, but for any developer that
stumbles upon current code in the future as well.

> 
> Regards,
> 
> 
> _______________________________________________
> ffmpeg-devel mailing list
> ffmpeg-devel@ffmpeg.org
> https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
> 
> To unsubscribe, visit link above, or email
> ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".
>
Nicolas George Aug. 10, 2020, 10:34 p.m. UTC | #6
James Almer (12020-08-10):
> It adds an extra struct and makes the code harder to read. Might as well
> just do
> 
> ctx = av_mallocz(sizeof(*ctx) + sizeof(AVBSFInternal));
> ctx->internal = &ctx[1];

This is exactly what the code does, except without the undefined
behaviors, because what you just wrote is completely invalid.

> And i *really* want to emphasize just how cold this function is,
> especially ever since av_bsf_flush() was introduced, and even more so if
> av_bsf_close() is also committed.

You missed the part where Andreas said this was a proof of concept, and
the same should be done for all similar cases. IIRC, you like
consistency.

> Skipping a 12 byte allocation does nothing on any system with a C
> library worth its salt, where it will be cached and ready to be returned
> by malloc.

It does save some memory.

> Can we not get overtly clever with code obfuscation?

There is no obfuscation, just a very standard an common practice.

Regards,
James Almer Aug. 10, 2020, 11:05 p.m. UTC | #7
On 8/10/2020 7:34 PM, Nicolas George wrote:
> James Almer (12020-08-10):
>> It adds an extra struct and makes the code harder to read. Might as well
>> just do
>>
>> ctx = av_mallocz(sizeof(*ctx) + sizeof(AVBSFInternal));
>> ctx->internal = &ctx[1];
> 
> This is exactly what the code does

I know, that's why i suggested it. Two lines, no extra structs, same result.

> , except without the undefined
> behaviors, because what you just wrote is completely invalid.

What is undefined in that? Should probably cast &ctx[1] to
AVBSFInternal* to silence a warning, but it works.

> 
>> And i *really* want to emphasize just how cold this function is,
>> especially ever since av_bsf_flush() was introduced, and even more so if
>> av_bsf_close() is also committed.
> 
> You missed the part where Andreas said this was a proof of concept, and
> the same should be done for all similar cases. IIRC, you like
> consistency.

I also like readability, and ctx = malloc(sizeof(*ctx)); ctx->internal =
malloc(sizeof(*ctx->internal)); is the approach that most clearly
transmits intent on functions where removing the second allocation
doesn't bring any real benefit.

> 
>> Skipping a 12 byte allocation does nothing on any system with a C
>> library worth its salt, where it will be cached and ready to be returned
>> by malloc.
> 
> It does save some memory.
> 
>> Can we not get overtly clever with code obfuscation?
> 
> There is no obfuscation, just a very standard an common practice.

It may be a common practice, but it's still obfuscation to a casual reader.

For that matter, i was told about the approach used with
AVCodecHWConfigInternal in libavcodec/hwconfig.h, which is similar to
this but doesn't introduce a third struct. What do you think?

> 
> Regards,
> 
> 
> _______________________________________________
> ffmpeg-devel mailing list
> ffmpeg-devel@ffmpeg.org
> https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
> 
> To unsubscribe, visit link above, or email
> ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".
>
Andreas Rheinhardt Aug. 10, 2020, 11:12 p.m. UTC | #8
James Almer:
> On 8/10/2020 7:11 PM, Nicolas George wrote:
>> James Almer (12020-08-10):
>>> Personally, i don't like it. It's extra complexity to save a single 8 or
>>> 12 byte allocation that happens once during bsf alloc. It's kind of a
>>> pointless micro-optimization.
>>
>> I do not agree at all.
>>
>> First, it is not extra complexity, it actually makes the code simpler:
>> less mutually dependant allocations that can lead to leaks if they are
>> not handled properly, better guarantees, for no more code.
> 
> It adds an extra struct and makes the code harder to read. Might as well
> just do
> 
> ctx = av_mallocz(sizeof(*ctx) + sizeof(AVBSFInternal));
> ctx->internal = &ctx[1];

This is not ok due to padding/alignment.

> 
> if removing one tiny allocation in an incredibly cold function is so
> important. Less code, same result.

I don't see an obfuscation/complication (and also no problem with
maintainability); I see a very simple method to save an allocation. And
I actually think that it simplifies the code, because now one doesn't
have to worry at all any more whether internal has been properly
allocated or not in av_bsf_free().
Do you remember 31aafdac2404c5e01b21e53255db3fb5ed53c7c9
(https://ffmpeg.org/pipermail/ffmpeg-devel/2019-October/251816.html)
where you fixed the bug that avformat_free_context() gets called upon
failure to allocate the AVFormatInternal despite avformat_free_context()
requiring the internal to be successfully allocated? That issue would
have never even existed if one allocated the context and its internal in
one piece.

- Andreas
Andreas Rheinhardt Aug. 10, 2020, 11:17 p.m. UTC | #9
Andreas Rheinhardt:
> James Almer:
>> On 8/10/2020 7:11 PM, Nicolas George wrote:
>>> James Almer (12020-08-10):
>>>> Personally, i don't like it. It's extra complexity to save a single 8 or
>>>> 12 byte allocation that happens once during bsf alloc. It's kind of a
>>>> pointless micro-optimization.
>>>
>>> I do not agree at all.
>>>
>>> First, it is not extra complexity, it actually makes the code simpler:
>>> less mutually dependant allocations that can lead to leaks if they are
>>> not handled properly, better guarantees, for no more code.
>>
>> It adds an extra struct and makes the code harder to read. Might as well
>> just do
>>
>> ctx = av_mallocz(sizeof(*ctx) + sizeof(AVBSFInternal));
>> ctx->internal = &ctx[1];
> 
> This is not ok due to padding/alignment.
> 
Forgot to mention that you are also circumventing the type system
whereas my version doesn't.

- Andreas
Zhao Zhili Aug. 10, 2020, 11:27 p.m. UTC | #10
> On Aug 11, 2020, at 5:48 PM, Nicolas George <george@nsup.org> wrote:
> 
> Andreas Rheinhardt (12020-08-11):
>> Imagine the context to only contain elements that require a alignment of
>> 4 and the internal structure to require an alignment of eight. Then it
>> is perfectly possible for &ctx[1] to only have an alignment of four and
>> not of eight as internal requires it.
> 
> With this, you answered to James' question to me "What is undefined in
> that?".
> 
>> There are two reasons why I like my approach more than Mark's: It allows
> 
> I cannot find what you refer to "Mark's" approach.

Andreas referr to struct AVCodecHWConfigInternal with commit
24cc0a53e99.

> 
>> to hide the details of the context<->internal relationship to the place
>> where allocations happen (and if one asserted as Nicolas suggested
>> (av_assert2(ctx->internal == &((AVBSFCombined *)ctx)->internal), it
>> would also need to be visible for the function doing the freeing, but
>> even then it can still be confined to one file). After all, the internal
>> of several important other structures are in private headers to be used
>> by lots of code.
> 
> ... and therefore, I cannot understand exactly the point you are making
> here.
> 
>> The second point is another micro-optimization: If one puts the
>> AVBSFContext at the beginning of AVBSFInternal, then the offsets of all
>> the fields currently existing in AVBSFInternal will increase and the
>> encoding of the "read address from pointer + offset" is typically
>> shorter if offset is smaller. This is not a problem here as the
>> structures here are small, but other structures are bigger. This could
>> of course be overcome by putting the AVBSFContext at the end of
>> AVBSFInternal; but somehow this feels unnatural, although I could get
>> used to it.
> 
> In this, you are mistaken.
> 
> First, AFAIK, the increase in cost for accessing pointer+offset only
> happens at a few specific offsets, probably related to powers of two,
> which means the gain, minute as it is, only happens for the few fields
> that are pushed over the edge, if there are any.
> 
> But second, and most of all, you are neglecting a much more important
> optimization. If AVBSFContext is the first field of AVBSFInternal, then
> they both have the same address (it is explicitly specified in the spec,
> and it has been used for decades by Gtk+ for example), and therefore we
> can dispense with the ctx->internal pointer and just use it directly.
> 
> We can still have two pointers in the code, one for each type, to make
> it more compact and readable. But even if we do, the compiler will know
> they are the same.
> 
> Note that getting rid of the .internal field reduces the size of the
> structure, and is in itself another optimization.
> 
> And since we only have one pointer instead of two, it frees a register
> for other uses.
> 
> So this is an even better solution than your proposal, but it requires
> all uses of ->internal to be replaced. I I count correctly, that makes
> only four lines to change, and for a trivial change.
> 
> Note that all this can be done while keeping the specifics and the
> pointer magic hidden in a single file:
> 
>    static inline AVBSFInternal *bsf_internal(AVBSFContext *bsf) {
>        return (AVBSFInternal *)bsf;
>    }
> 
> Then the code only needs to write:
> 
>    AVBSFInternal *int = bsf_internal(bsf);
> 
> Regards,
> 
> -- 
>  Nicolas George
> _______________________________________________
> ffmpeg-devel mailing list
> ffmpeg-devel@ffmpeg.org
> https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
> 
> To unsubscribe, visit link above, or email
> ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".
James Almer Aug. 10, 2020, 11:34 p.m. UTC | #11
On 8/10/2020 8:12 PM, Andreas Rheinhardt wrote:
> James Almer:
>> On 8/10/2020 7:11 PM, Nicolas George wrote:
>>> James Almer (12020-08-10):
>>>> Personally, i don't like it. It's extra complexity to save a single 8 or
>>>> 12 byte allocation that happens once during bsf alloc. It's kind of a
>>>> pointless micro-optimization.
>>>
>>> I do not agree at all.
>>>
>>> First, it is not extra complexity, it actually makes the code simpler:
>>> less mutually dependant allocations that can lead to leaks if they are
>>> not handled properly, better guarantees, for no more code.
>>
>> It adds an extra struct and makes the code harder to read. Might as well
>> just do
>>
>> ctx = av_mallocz(sizeof(*ctx) + sizeof(AVBSFInternal));
>> ctx->internal = &ctx[1];
> 
> This is not ok due to padding/alignment.

libdav1d would have broke by now if that was the case. Do you know a
system where this could fail?

> 
>>
>> if removing one tiny allocation in an incredibly cold function is so
>> important. Less code, same result.
> 
> I don't see an obfuscation/complication (and also no problem with
> maintainability); I see a very simple method to save an allocation. And
> I actually think that it simplifies the code, because now one doesn't
> have to worry at all any more whether internal has been properly
> allocated or not in av_bsf_free().

I don't deny it simplifies the code in regards to freeing the context in
failure paths, but adding a third struct does not make things clearer at
all. So Mark's suggestion to add AVBSFContext into AVBSFInternal may be
a better approach.

> Do you remember 31aafdac2404c5e01b21e53255db3fb5ed53c7c9
> (https://ffmpeg.org/pipermail/ffmpeg-devel/2019-October/251816.html)
> where you fixed the bug that avformat_free_context() gets called upon
> failure to allocate the AVFormatInternal despite avformat_free_context()
> requiring the internal to be successfully allocated? That issue would
> have never even existed if one allocated the context and its internal in
> one piece.
> 
> - Andreas
> _______________________________________________
> ffmpeg-devel mailing list
> ffmpeg-devel@ffmpeg.org
> https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
> 
> To unsubscribe, visit link above, or email
> ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".
>
Andreas Rheinhardt Aug. 11, 2020, 12:14 a.m. UTC | #12
James Almer:
> On 8/10/2020 8:12 PM, Andreas Rheinhardt wrote:
>> James Almer:
>>> On 8/10/2020 7:11 PM, Nicolas George wrote:
>>>> James Almer (12020-08-10):
>>>>> Personally, i don't like it. It's extra complexity to save a single 8 or
>>>>> 12 byte allocation that happens once during bsf alloc. It's kind of a
>>>>> pointless micro-optimization.
>>>>
>>>> I do not agree at all.
>>>>
>>>> First, it is not extra complexity, it actually makes the code simpler:
>>>> less mutually dependant allocations that can lead to leaks if they are
>>>> not handled properly, better guarantees, for no more code.
>>>
>>> It adds an extra struct and makes the code harder to read. Might as well
>>> just do
>>>
>>> ctx = av_mallocz(sizeof(*ctx) + sizeof(AVBSFInternal));
>>> ctx->internal = &ctx[1];
>>
>> This is not ok due to padding/alignment.
> 
> libdav1d would have broke by now if that was the case. Do you know a
> system where this could fail?
> 

Imagine the context to only contain elements that require a alignment of
4 and the internal structure to require an alignment of eight. Then it
is perfectly possible for &ctx[1] to only have an alignment of four and
not of eight as internal requires it.

Could you elaborate the point about libdav1d?

>>
>>>
>>> if removing one tiny allocation in an incredibly cold function is so
>>> important. Less code, same result.
>>
>> I don't see an obfuscation/complication (and also no problem with
>> maintainability); I see a very simple method to save an allocation. And
>> I actually think that it simplifies the code, because now one doesn't
>> have to worry at all any more whether internal has been properly
>> allocated or not in av_bsf_free().
> 
> I don't deny it simplifies the code in regards to freeing the context in
> failure paths, but adding a third struct does not make things clearer at
> all. So Mark's suggestion to add AVBSFContext into AVBSFInternal may be
> a better approach.
> 
There are two reasons why I like my approach more than Mark's: It allows
to hide the details of the context<->internal relationship to the place
where allocations happen (and if one asserted as Nicolas suggested
(av_assert2(ctx->internal == &((AVBSFCombined *)ctx)->internal), it
would also need to be visible for the function doing the freeing, but
even then it can still be confined to one file). After all, the internal
of several important other structures are in private headers to be used
by lots of code.
The second point is another micro-optimization: If one puts the
AVBSFContext at the beginning of AVBSFInternal, then the offsets of all
the fields currently existing in AVBSFInternal will increase and the
encoding of the "read address from pointer + offset" is typically
shorter if offset is smaller. This is not a problem here as the
structures here are small, but other structures are bigger. This could
of course be overcome by putting the AVBSFContext at the end of
AVBSFInternal; but somehow this feels unnatural, although I could get
used to it.

>> Do you remember 31aafdac2404c5e01b21e53255db3fb5ed53c7c9
>> (https://ffmpeg.org/pipermail/ffmpeg-devel/2019-October/251816.html)
>> where you fixed the bug that avformat_free_context() gets called upon
>> failure to allocate the AVFormatInternal despite avformat_free_context()
>> requiring the internal to be successfully allocated? That issue would
>> have never even existed if one allocated the context and its internal in
>> one piece.
>>
>> - Andreas
James Almer Aug. 11, 2020, 1:36 a.m. UTC | #13
On 8/10/2020 9:14 PM, Andreas Rheinhardt wrote:
> James Almer:
>> On 8/10/2020 8:12 PM, Andreas Rheinhardt wrote:
>>> James Almer:
>>>> On 8/10/2020 7:11 PM, Nicolas George wrote:
>>>>> James Almer (12020-08-10):
>>>>>> Personally, i don't like it. It's extra complexity to save a single 8 or
>>>>>> 12 byte allocation that happens once during bsf alloc. It's kind of a
>>>>>> pointless micro-optimization.
>>>>>
>>>>> I do not agree at all.
>>>>>
>>>>> First, it is not extra complexity, it actually makes the code simpler:
>>>>> less mutually dependant allocations that can lead to leaks if they are
>>>>> not handled properly, better guarantees, for no more code.
>>>>
>>>> It adds an extra struct and makes the code harder to read. Might as well
>>>> just do
>>>>
>>>> ctx = av_mallocz(sizeof(*ctx) + sizeof(AVBSFInternal));
>>>> ctx->internal = &ctx[1];
>>>
>>> This is not ok due to padding/alignment.
>>
>> libdav1d would have broke by now if that was the case. Do you know a
>> system where this could fail?
>>
> 
> Imagine the context to only contain elements that require a alignment of
> 4 and the internal structure to require an alignment of eight. Then it
> is perfectly possible for &ctx[1] to only have an alignment of four and
> not of eight as internal requires it.

But is that the case here? AVBSFContext is 64 bytes on x86_64 (six 8
byte pointers, four ints, no padding), and AVBSFInternal is 16 (8 byte
pointer properly aligned after AVBSFContext, 4 byte int, 4 byte
padding). On 32 bit systems with 4 byte alignment for pointers, it would
also be fine.

In any case, I'm not suggesting to do this. I'd rather just not make any
change at all.

> 
> Could you elaborate the point about libdav1d?

It uses this same approach to insert a byte array at the end of a
context rather than a struct, so probably not the same situation.

> 
>>>
>>>>
>>>> if removing one tiny allocation in an incredibly cold function is so
>>>> important. Less code, same result.
>>>
>>> I don't see an obfuscation/complication (and also no problem with
>>> maintainability); I see a very simple method to save an allocation. And
>>> I actually think that it simplifies the code, because now one doesn't
>>> have to worry at all any more whether internal has been properly
>>> allocated or not in av_bsf_free().
>>
>> I don't deny it simplifies the code in regards to freeing the context in
>> failure paths, but adding a third struct does not make things clearer at
>> all. So Mark's suggestion to add AVBSFContext into AVBSFInternal may be
>> a better approach.
>>
> There are two reasons why I like my approach more than Mark's: It allows
> to hide the details of the context<->internal relationship to the place
> where allocations happen (and if one asserted as Nicolas suggested
> (av_assert2(ctx->internal == &((AVBSFCombined *)ctx)->internal), it
> would also need to be visible for the function doing the freeing, but
> even then it can still be confined to one file). After all, the internal
> of several important other structures are in private headers to be used
> by lots of code.
> The second point is another micro-optimization: If one puts the
> AVBSFContext at the beginning of AVBSFInternal, then the offsets of all
> the fields currently existing in AVBSFInternal will increase and the
> encoding of the "read address from pointer + offset" is typically
> shorter if offset is smaller. This is not a problem here as the
> structures here are small, but other structures are bigger. This could
> of course be overcome by putting the AVBSFContext at the end of
> AVBSFInternal; but somehow this feels unnatural, although I could get
> used to it.

Instruction encoding for bigger offsets (>=128 on x86) is not an
acceptable reason to choose one approach or another here. That is beyond
ricing. This is not a hot SIMD loop in a video DSP function where every
byte counts.

> 
>>> Do you remember 31aafdac2404c5e01b21e53255db3fb5ed53c7c9
>>> (https://ffmpeg.org/pipermail/ffmpeg-devel/2019-October/251816.html)
>>> where you fixed the bug that avformat_free_context() gets called upon
>>> failure to allocate the AVFormatInternal despite avformat_free_context()
>>> requiring the internal to be successfully allocated? That issue would
>>> have never even existed if one allocated the context and its internal in
>>> one piece.
>>>
>>> - Andreas
> _______________________________________________
> ffmpeg-devel mailing list
> ffmpeg-devel@ffmpeg.org
> https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
> 
> To unsubscribe, visit link above, or email
> ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".
>
Nicolas George Aug. 11, 2020, 9:48 a.m. UTC | #14
Andreas Rheinhardt (12020-08-11):
> Imagine the context to only contain elements that require a alignment of
> 4 and the internal structure to require an alignment of eight. Then it
> is perfectly possible for &ctx[1] to only have an alignment of four and
> not of eight as internal requires it.

With this, you answered to James' question to me "What is undefined in
that?".

> There are two reasons why I like my approach more than Mark's: It allows

I cannot find what you refer to "Mark's" approach.

> to hide the details of the context<->internal relationship to the place
> where allocations happen (and if one asserted as Nicolas suggested
> (av_assert2(ctx->internal == &((AVBSFCombined *)ctx)->internal), it
> would also need to be visible for the function doing the freeing, but
> even then it can still be confined to one file). After all, the internal
> of several important other structures are in private headers to be used
> by lots of code.

... and therefore, I cannot understand exactly the point you are making
here.

> The second point is another micro-optimization: If one puts the
> AVBSFContext at the beginning of AVBSFInternal, then the offsets of all
> the fields currently existing in AVBSFInternal will increase and the
> encoding of the "read address from pointer + offset" is typically
> shorter if offset is smaller. This is not a problem here as the
> structures here are small, but other structures are bigger. This could
> of course be overcome by putting the AVBSFContext at the end of
> AVBSFInternal; but somehow this feels unnatural, although I could get
> used to it.

In this, you are mistaken.

First, AFAIK, the increase in cost for accessing pointer+offset only
happens at a few specific offsets, probably related to powers of two,
which means the gain, minute as it is, only happens for the few fields
that are pushed over the edge, if there are any.

But second, and most of all, you are neglecting a much more important
optimization. If AVBSFContext is the first field of AVBSFInternal, then
they both have the same address (it is explicitly specified in the spec,
and it has been used for decades by Gtk+ for example), and therefore we
can dispense with the ctx->internal pointer and just use it directly.

We can still have two pointers in the code, one for each type, to make
it more compact and readable. But even if we do, the compiler will know
they are the same.

Note that getting rid of the .internal field reduces the size of the
structure, and is in itself another optimization.

And since we only have one pointer instead of two, it frees a register
for other uses.

So this is an even better solution than your proposal, but it requires
all uses of ->internal to be replaced. I I count correctly, that makes
only four lines to change, and for a trivial change.

Note that all this can be done while keeping the specifics and the
pointer magic hidden in a single file:

    static inline AVBSFInternal *bsf_internal(AVBSFContext *bsf) {
        return (AVBSFInternal *)bsf;
    }

Then the code only needs to write:

    AVBSFInternal *int = bsf_internal(bsf);

Regards,
James Almer Aug. 11, 2020, 1:30 p.m. UTC | #15
On 8/11/2020 6:48 AM, Nicolas George wrote:
> Andreas Rheinhardt (12020-08-11):
>> Imagine the context to only contain elements that require a alignment of
>> 4 and the internal structure to require an alignment of eight. Then it
>> is perfectly possible for &ctx[1] to only have an alignment of four and
>> not of eight as internal requires it.
> 
> With this, you answered to James' question to me "What is undefined in
> that?".
> 
>> There are two reasons why I like my approach more than Mark's: It allows
> 
> I cannot find what you refer to "Mark's" approach.
> 
>> to hide the details of the context<->internal relationship to the place
>> where allocations happen (and if one asserted as Nicolas suggested
>> (av_assert2(ctx->internal == &((AVBSFCombined *)ctx)->internal), it
>> would also need to be visible for the function doing the freeing, but
>> even then it can still be confined to one file). After all, the internal
>> of several important other structures are in private headers to be used
>> by lots of code.
> 
> ... and therefore, I cannot understand exactly the point you are making
> here.
> 
>> The second point is another micro-optimization: If one puts the
>> AVBSFContext at the beginning of AVBSFInternal, then the offsets of all
>> the fields currently existing in AVBSFInternal will increase and the
>> encoding of the "read address from pointer + offset" is typically
>> shorter if offset is smaller. This is not a problem here as the
>> structures here are small, but other structures are bigger. This could
>> of course be overcome by putting the AVBSFContext at the end of
>> AVBSFInternal; but somehow this feels unnatural, although I could get
>> used to it.
> 
> In this, you are mistaken.
> 
> First, AFAIK, the increase in cost for accessing pointer+offset only
> happens at a few specific offsets, probably related to powers of two,
> which means the gain, minute as it is, only happens for the few fields
> that are pushed over the edge, if there are any.
> 
> But second, and most of all, you are neglecting a much more important
> optimization. If AVBSFContext is the first field of AVBSFInternal, then
> they both have the same address (it is explicitly specified in the spec,
> and it has been used for decades by Gtk+ for example), and therefore we
> can dispense with the ctx->internal pointer and just use it directly.

Indeed, we take advantage of this all across the codebase for AVClass,
making AVOptions trivial to implement for any struct.

This approach both removes the need for a third struct and follows
existing practice, so i prefer it.

> 
> We can still have two pointers in the code, one for each type, to make
> it more compact and readable. But even if we do, the compiler will know
> they are the same.
> 
> Note that getting rid of the .internal field reduces the size of the
> structure, and is in itself another optimization.
> 
> And since we only have one pointer instead of two, it frees a register
> for other uses.
> 
> So this is an even better solution than your proposal, but it requires
> all uses of ->internal to be replaced. I I count correctly, that makes
> only four lines to change, and for a trivial change.
> 
> Note that all this can be done while keeping the specifics and the
> pointer magic hidden in a single file:
> 
>     static inline AVBSFInternal *bsf_internal(AVBSFContext *bsf) {
>         return (AVBSFInternal *)bsf;
>     }
> 
> Then the code only needs to write:
> 
>     AVBSFInternal *int = bsf_internal(bsf);
> 
> Regards,
> 
> 
> _______________________________________________
> ffmpeg-devel mailing list
> ffmpeg-devel@ffmpeg.org
> https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
> 
> To unsubscribe, visit link above, or email
> ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".
>
diff mbox series

Patch

diff --git a/libavcodec/bsf.c b/libavcodec/bsf.c
index d71bc32584..9781be78e7 100644
--- a/libavcodec/bsf.c
+++ b/libavcodec/bsf.c
@@ -37,6 +37,11 @@  struct AVBSFInternal {
     int eof;
 };
 
+typedef struct AVBSFCombined {
+    AVBSFContext bsf;
+    AVBSFInternal internal;
+} AVBSFCombined;
+
 void av_bsf_free(AVBSFContext **pctx)
 {
     AVBSFContext *ctx;
@@ -50,9 +55,8 @@  void av_bsf_free(AVBSFContext **pctx)
     if (ctx->filter->priv_class && ctx->priv_data)
         av_opt_free(ctx->priv_data);
 
-    if (ctx->internal)
-        av_packet_free(&ctx->internal->buffer_pkt);
-    av_freep(&ctx->internal);
+    av_assert2(ctx->internal);
+    av_packet_free(&ctx->internal->buffer_pkt);
     av_freep(&ctx->priv_data);
 
     avcodec_parameters_free(&ctx->par_in);
@@ -93,14 +97,18 @@  const AVClass *av_bsf_get_class(void)
 
 int av_bsf_alloc(const AVBitStreamFilter *filter, AVBSFContext **pctx)
 {
+    AVBSFCombined *com;
     AVBSFContext *ctx;
     AVBSFInternal *bsfi;
     int ret;
 
-    ctx = av_mallocz(sizeof(*ctx));
-    if (!ctx)
+    com = av_mallocz(sizeof(*com));
+    if (!com)
         return AVERROR(ENOMEM);
 
+    ctx  = &com->bsf;
+    bsfi = ctx->internal = &com->internal;
+
     ctx->av_class = &bsf_class;
     ctx->filter   = filter;
 
@@ -111,13 +119,6 @@  int av_bsf_alloc(const AVBitStreamFilter *filter, AVBSFContext **pctx)
         goto fail;
     }
 
-    bsfi = av_mallocz(sizeof(*bsfi));
-    if (!bsfi) {
-        ret = AVERROR(ENOMEM);
-        goto fail;
-    }
-    ctx->internal = bsfi;
-
     bsfi->buffer_pkt = av_packet_alloc();
     if (!bsfi->buffer_pkt) {
         ret = AVERROR(ENOMEM);