diff mbox series

[FFmpeg-devel] avformat/utils: add helper functions to retrieve index entries from an AVStream

Message ID 20210323183622.2219-1-jamrial@gmail.com
State Accepted
Headers show
Series [FFmpeg-devel] avformat/utils: add helper functions to retrieve index entries from an AVStream | expand

Checks

Context Check Description
andriy/x86_make success Make finished
andriy/x86_make_fate success Make fate finished
andriy/PPC64_make success Make finished
andriy/PPC64_make_fate success Make fate finished

Commit Message

James Almer March 23, 2021, 6:36 p.m. UTC
Signed-off-by: James Almer <jamrial@gmail.com>
---
Following "lavf: move AVStream.*index_entries* to AVStreamInternal", it was
revealed that some library users apparently found these fields useful and were
accessing them despite not being public.
This patch introduces a few functions to recover this functionality in a proper
and supported way.

We can't simply return a const pointer to the corresponding AVIndexEntry in the
array because it may change at any time by for example calling
av_add_index_entry(), so it's either a copy of the AVIndexEntry, or adding half
a dozen output parameters to the function signature for each field in
AVIndexEntry.

 libavformat/avformat.h | 39 +++++++++++++++++++++++++++++++++++++++
 libavformat/utils.c    | 32 ++++++++++++++++++++++++++++++++
 2 files changed, 71 insertions(+)

Comments

Nicolas George March 23, 2021, 6:47 p.m. UTC | #1
James Almer (12021-03-23):
> Signed-off-by: James Almer <jamrial@gmail.com>
> ---
> Following "lavf: move AVStream.*index_entries* to AVStreamInternal", it was
> revealed that some library users apparently found these fields useful and were
> accessing them despite not being public.
> This patch introduces a few functions to recover this functionality in a proper
> and supported way.
> 

> We can't simply return a const pointer to the corresponding AVIndexEntry in the
> array because it may change at any time by for example calling
> av_add_index_entry(), so it's either a copy of the AVIndexEntry, or adding half
> a dozen output parameters to the function signature for each field in
> AVIndexEntry.

We can return a const pointer to the AVIndexEntry if we document it:

	Warning: the returned pointer is only valid until the next
	operation that may modify the AVStream or the AVFormatContext it
	belongs to. Accessing it after av_read_frame() or other
	functions results in undefined behavior.

Users can still copy it immediately if they need it longer.

> 
>  libavformat/avformat.h | 39 +++++++++++++++++++++++++++++++++++++++
>  libavformat/utils.c    | 32 ++++++++++++++++++++++++++++++++
>  2 files changed, 71 insertions(+)
> 
> diff --git a/libavformat/avformat.h b/libavformat/avformat.h
> index 765bc3b6f5..ac8b57149e 100644
> --- a/libavformat/avformat.h
> +++ b/libavformat/avformat.h
> @@ -2754,6 +2754,45 @@ int av_find_default_stream_index(AVFormatContext *s);
>   */
>  int av_index_search_timestamp(AVStream *st, int64_t timestamp, int flags);
>  
> +/**
> + * Get the index entry count for the given AVStream.
> + *
> + * @param st stream
> + * @return the number of index entries in the stream
> + */
> +int av_index_get_entries_count(AVStream *st);
> +
> +/**
> + * Get the AVIndexEntry corresponding to the given index.
> + *
> + * @param st          stream containing the requested AVIndexEntry
> + * @param index_entry pointer to an AVIndexEntry where to store the entry. On failure,
> + *                    or if the index entry could not be found, this struct will remain
> + *                    untouched
> + * @param idx         the desired index
> + * @return >= 0 on success. AVERROR(ENOENT) if no entry for the requested index could
> +           be found. other negative AVERROR codes on failure.
> + */
> +int av_index_get_entry(AVStream *st, AVIndexEntry *index_entry, int idx);
> +
> +/**
> + * Get the AVIndexEntry corresponding to the given timestamp.
> + *
> + * @param st          stream containing the requested AVIndexEntry
> + * @param index_entry pointer to an AVIndexEntry where to store the entry. On failure,
> + *                    or if the index entry could not be found, this struct will remain
> + *                    untouched
> + * @param timestamp   timestamp to retrieve the index entry for
> + * @param flags       if AVSEEK_FLAG_BACKWARD then the returned entry will correspond
> + *                    to the timestamp which is <= the requested one, if backward
> + *                    is 0, then it will be >=
> + *                    if AVSEEK_FLAG_ANY seek to any frame, only keyframes otherwise
> + * @return >= 0 on success. AVERROR(ENOENT) if no entry for the requested timestamp could
> + *          be found. other negative AVERROR codes on failure.
> + */
> +int av_index_get_entry_from_timestamp(AVStream *st, AVIndexEntry *index_entry,
> +                                      int64_t wanted_timestamp, int flags);
> +
>  /**
>   * Add an index entry into a sorted list. Update the entry if the list
>   * already contains it.
> diff --git a/libavformat/utils.c b/libavformat/utils.c
> index f31826f2ea..4e7a8bf23e 100644
> --- a/libavformat/utils.c
> +++ b/libavformat/utils.c
> @@ -2129,6 +2129,38 @@ int av_index_search_timestamp(AVStream *st, int64_t wanted_timestamp, int flags)
>                                       wanted_timestamp, flags);
>  }
>  
> +int av_index_get_entries_count(AVStream *st)
> +{
> +    return st->internal->nb_index_entries;
> +}
> +
> +int av_index_get_entry(AVStream *st, AVIndexEntry *index_entry, int idx)
> +{

> +    if (idx < 0)
> +        return AVERROR(EINVAL);

Make idx unsigned.

> +    if (idx >= st->internal->nb_index_entries)
> +        return AVERROR(ENOENT);
> +

> +    memcpy(index_entry, &st->internal->index_entries[idx], sizeof(*index_entry));

Why not using an assignment?

But this makes sizeof(AVIndexEntry) part of the ABI. Probably not what
we want.

> +
> +    return 0;
> +}
> +
> +int av_index_get_entry_from_timestamp(AVStream *st, AVIndexEntry *index_entry,
> +                                      int64_t wanted_timestamp, int flags)
> +{
> +    int idx = ff_index_search_timestamp(st->internal->index_entries,
> +                                        st->internal->nb_index_entries,
> +                                        wanted_timestamp, flags);
> +
> +    if (idx < 0)
> +        return AVERROR(ENOENT);
> +

> +    memcpy(index_entry, &st->internal->index_entries[idx], sizeof(*index_entry));

Same, of course.

> +
> +    return 0;
> +}
> +
>  static int64_t ff_read_timestamp(AVFormatContext *s, int stream_index, int64_t *ppos, int64_t pos_limit,
>                                   int64_t (*read_timestamp)(struct AVFormatContext *, int , int64_t *, int64_t ))
>  {

Regards,
James Almer March 23, 2021, 7:02 p.m. UTC | #2
On 3/23/2021 3:47 PM, Nicolas George wrote:
> James Almer (12021-03-23):
>> Signed-off-by: James Almer <jamrial@gmail.com>
>> ---
>> Following "lavf: move AVStream.*index_entries* to AVStreamInternal", it was
>> revealed that some library users apparently found these fields useful and were
>> accessing them despite not being public.
>> This patch introduces a few functions to recover this functionality in a proper
>> and supported way.
>>
> 
>> We can't simply return a const pointer to the corresponding AVIndexEntry in the
>> array because it may change at any time by for example calling
>> av_add_index_entry(), so it's either a copy of the AVIndexEntry, or adding half
>> a dozen output parameters to the function signature for each field in
>> AVIndexEntry.
> 
> We can return a const pointer to the AVIndexEntry if we document it:
> 
> 	Warning: the returned pointer is only valid until the next
> 	operation that may modify the AVStream or the AVFormatContext it
> 	belongs to. Accessing it after av_read_frame() or other
> 	functions results in undefined behavior.
> 
> Users can still copy it immediately if they need it longer.

I really don't like the idea of returning a pointer to some offset 
within an internal struct that may either start pointing at some other 
entry or even to freed memory, especially when the alternative of giving 
the user a copy of a single entry is trivial to do and actually safe.

> 
>>
>>   libavformat/avformat.h | 39 +++++++++++++++++++++++++++++++++++++++
>>   libavformat/utils.c    | 32 ++++++++++++++++++++++++++++++++
>>   2 files changed, 71 insertions(+)
>>
>> diff --git a/libavformat/avformat.h b/libavformat/avformat.h
>> index 765bc3b6f5..ac8b57149e 100644
>> --- a/libavformat/avformat.h
>> +++ b/libavformat/avformat.h
>> @@ -2754,6 +2754,45 @@ int av_find_default_stream_index(AVFormatContext *s);
>>    */
>>   int av_index_search_timestamp(AVStream *st, int64_t timestamp, int flags);
>>   
>> +/**
>> + * Get the index entry count for the given AVStream.
>> + *
>> + * @param st stream
>> + * @return the number of index entries in the stream
>> + */
>> +int av_index_get_entries_count(AVStream *st);
>> +
>> +/**
>> + * Get the AVIndexEntry corresponding to the given index.
>> + *
>> + * @param st          stream containing the requested AVIndexEntry
>> + * @param index_entry pointer to an AVIndexEntry where to store the entry. On failure,
>> + *                    or if the index entry could not be found, this struct will remain
>> + *                    untouched
>> + * @param idx         the desired index
>> + * @return >= 0 on success. AVERROR(ENOENT) if no entry for the requested index could
>> +           be found. other negative AVERROR codes on failure.
>> + */
>> +int av_index_get_entry(AVStream *st, AVIndexEntry *index_entry, int idx);
>> +
>> +/**
>> + * Get the AVIndexEntry corresponding to the given timestamp.
>> + *
>> + * @param st          stream containing the requested AVIndexEntry
>> + * @param index_entry pointer to an AVIndexEntry where to store the entry. On failure,
>> + *                    or if the index entry could not be found, this struct will remain
>> + *                    untouched
>> + * @param timestamp   timestamp to retrieve the index entry for
>> + * @param flags       if AVSEEK_FLAG_BACKWARD then the returned entry will correspond
>> + *                    to the timestamp which is <= the requested one, if backward
>> + *                    is 0, then it will be >=
>> + *                    if AVSEEK_FLAG_ANY seek to any frame, only keyframes otherwise
>> + * @return >= 0 on success. AVERROR(ENOENT) if no entry for the requested timestamp could
>> + *          be found. other negative AVERROR codes on failure.
>> + */
>> +int av_index_get_entry_from_timestamp(AVStream *st, AVIndexEntry *index_entry,
>> +                                      int64_t wanted_timestamp, int flags);
>> +
>>   /**
>>    * Add an index entry into a sorted list. Update the entry if the list
>>    * already contains it.
>> diff --git a/libavformat/utils.c b/libavformat/utils.c
>> index f31826f2ea..4e7a8bf23e 100644
>> --- a/libavformat/utils.c
>> +++ b/libavformat/utils.c
>> @@ -2129,6 +2129,38 @@ int av_index_search_timestamp(AVStream *st, int64_t wanted_timestamp, int flags)
>>                                        wanted_timestamp, flags);
>>   }
>>   
>> +int av_index_get_entries_count(AVStream *st)
>> +{
>> +    return st->internal->nb_index_entries;
>> +}
>> +
>> +int av_index_get_entry(AVStream *st, AVIndexEntry *index_entry, int idx)
>> +{
> 
>> +    if (idx < 0)
>> +        return AVERROR(EINVAL);
> 
> Make idx unsigned.

Ok.

> 
>> +    if (idx >= st->internal->nb_index_entries)
>> +        return AVERROR(ENOENT);
>> +
> 
>> +    memcpy(index_entry, &st->internal->index_entries[idx], sizeof(*index_entry));
> 
> Why not using an assignment?
> 
> But this makes sizeof(AVIndexEntry) part of the ABI. Probably not what
> we want.

The struct itself says nothing about it not being part of the ABI, and 
we usually document when that's the case to let the user know.
That being said, no public function until now used that struct as 
parameter, so to not effectively tie it to the ABI if that's undesired, 
i could instead make index_entry a pointer to pointer, do an 
av_memdup(), and document that the user owns the returned entry and must 
free it after using it.

> 
>> +
>> +    return 0;
>> +}
>> +
>> +int av_index_get_entry_from_timestamp(AVStream *st, AVIndexEntry *index_entry,
>> +                                      int64_t wanted_timestamp, int flags)
>> +{
>> +    int idx = ff_index_search_timestamp(st->internal->index_entries,
>> +                                        st->internal->nb_index_entries,
>> +                                        wanted_timestamp, flags);
>> +
>> +    if (idx < 0)
>> +        return AVERROR(ENOENT);
>> +
> 
>> +    memcpy(index_entry, &st->internal->index_entries[idx], sizeof(*index_entry));
> 
> Same, of course.
> 
>> +
>> +    return 0;
>> +}
>> +
>>   static int64_t ff_read_timestamp(AVFormatContext *s, int stream_index, int64_t *ppos, int64_t pos_limit,
>>                                    int64_t (*read_timestamp)(struct AVFormatContext *, int , int64_t *, int64_t ))
>>   {
> 
> Regards,
> 
> 
> _______________________________________________
> ffmpeg-devel mailing list
> ffmpeg-devel@ffmpeg.org
> https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
> 
> To unsubscribe, visit link above, or email
> ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".
>
Nicolas George March 23, 2021, 7:07 p.m. UTC | #3
James Almer (12021-03-23):
> I really don't like the idea of returning a pointer to some offset within an
> internal struct that may either start pointing at some other entry or even
> to freed memory, especially when the alternative of giving the user a copy
> of a single entry is trivial to do and actually safe.

It is already what the users who accessed the field directly did.

And I am pretty sure I can find examples where we do similar things in
the code base.

And it is exactly what we are doing when we let users access fields
directly.

And the problem of sizeof(AVIndexEntry) part of the ABI just disappears
if we do that.

May I suggest that you recalibrate your dislike?

Regards,
Nicolas George March 23, 2021, 7:11 p.m. UTC | #4
Nicolas George (12021-03-23):
> And it is exactly what we are doing when we let users access fields
> directly.

I mean:

	AVStream **streams = ctx->streams;
	av_read_frame(ctx, &packet);
	AVStream *stream = streams[packet.stream_index];

That should work, right?

Regards,
James Almer March 23, 2021, 7:22 p.m. UTC | #5
On 3/23/2021 4:07 PM, Nicolas George wrote:
> James Almer (12021-03-23):
>> I really don't like the idea of returning a pointer to some offset within an
>> internal struct that may either start pointing at some other entry or even
>> to freed memory, especially when the alternative of giving the user a copy
>> of a single entry is trivial to do and actually safe.
> 
> It is already what the users who accessed the field directly did.

Users accessed st->index_entries, or fields like 
st->index_entries[idx].timestamp. If the array was reallocated by lavf, 
st->index_entries is made to point to the new array, and accessing it or 
fields at offsets within it still worked as expected and gave you the 
information related to the index you requested.

If a function returns a pointer to some offset within an internal array, 
and said array changes or disappears, the pointer returned is now 
pointing to who knows what, and that's unsafe. A copy is trivial and safe.

> 
> And I am pretty sure I can find examples where we do similar things in
> the code base.

I recall many people have said before that just because it was done 
before doesn't mean it should be done again. That's how bad practices 
spread.

> 
> And it is exactly what we are doing when we let users access fields
> directly.

See above. st->index_entries will always point to the current valid 
array, even if it was reallocated.

> 
> And the problem of sizeof(AVIndexEntry) part of the ABI just disappears
> if we do that.

It also disappears if we return a AVIndexEntry the user must free.

> 
> May I suggest that you recalibrate your dislike?

No, I'll not stop disliking this practice, for the reasons i stated 
above. And you'll not find me asking you to recalibrate you like, only 
explain why i think this new API should not use that approach.

> 
> Regards,
> 
> 
> _______________________________________________
> ffmpeg-devel mailing list
> ffmpeg-devel@ffmpeg.org
> https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
> 
> To unsubscribe, visit link above, or email
> ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".
>
James Almer March 23, 2021, 7:22 p.m. UTC | #6
On 3/23/2021 4:11 PM, Nicolas George wrote:
> Nicolas George (12021-03-23):
>> And it is exactly what we are doing when we let users access fields
>> directly.
> 
> I mean:
> 
> 	AVStream **streams = ctx->streams;
> 	av_read_frame(ctx, &packet);
> 	AVStream *stream = streams[packet.stream_index];
> 
> That should work, right?

No, avformat_new_stream() will reallocate that array, so if 
av_read_frame() can allocate new streams (I think AVFMT_NOHEADER formats 
do that) then that may just crash.

You should always use ctx->streams directly.

> 
> Regards,
> 
> 
> _______________________________________________
> ffmpeg-devel mailing list
> ffmpeg-devel@ffmpeg.org
> https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
> 
> To unsubscribe, visit link above, or email
> ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".
>
Nicolas George March 23, 2021, 7:40 p.m. UTC | #7
James Almer (12021-03-23):
> I recall many people have said before that just because it was done before
> doesn't mean it should be done again. That's how bad practices spread.

I will happily concede you this. All the more happily that I will keep
it warm to serve it back when you oppose to one of my creative API
inventions for the sake of consistency ;-Þ

> > 	AVStream **streams = ctx->streams;
> > 	av_read_frame(ctx, &packet);
> > 	AVStream *stream = streams[packet.stream_index];
> No, avformat_new_stream() will reallocate that array, so if av_read_frame()
> can allocate new streams (I think AVFMT_NOHEADER formats do that) then that
> may just crash.

This is exactly why I chose this example.

> You should always use ctx->streams directly.

I am using ctx->streams directly. More accurately, we should always use
ctx->streams *immediately*.

And that's exactly the same with giving access to the internal
structure: they use the fields *immediately*, and everything is fine.

What you are defending is equivalent to defending
avformat_get_stream(ctx, idx) just to prevent users from accessing
ctx->streams directly because they may keep a pointer to an outdated
array. Note that in my proposal, the constraint is clearly documented.
This is not the case, currently, for ctx->streams.

Furthermore, returning a pointer avoids copying the fields that the user
will not use. It is minuscule, of course. But it is still one more
argument for returning the pointer.

We have to choose between having an API proof against users who do not
read the doc and try to guess what they can get away with and having an
API that is efficient and convenient for careful users, we cannot have
both.

I vote we choose the second, because the first is not really possible.

Regards,
James Almer March 23, 2021, 8:04 p.m. UTC | #8
On 3/23/2021 4:40 PM, Nicolas George wrote:
> James Almer (12021-03-23):
>> I recall many people have said before that just because it was done before
>> doesn't mean it should be done again. That's how bad practices spread.
> 
> I will happily concede you this. All the more happily that I will keep
> it warm to serve it back when you oppose to one of my creative API
> inventions for the sake of consistency ;-Þ

If your creative API invention is better, then i have no problem with it 
even if it goes against existing conventions.

Which for that matter reminds me that i changed my mind regarding your 
refcounted proposal, since the alternative of adding an AVRefCount 
struct other structs can then use has proven to be kinda ugly/unwieldy, 
at least in the example implementation i saw. So there you have it.

> 
>>> 	AVStream **streams = ctx->streams;
>>> 	av_read_frame(ctx, &packet);
>>> 	AVStream *stream = streams[packet.stream_index];
>> No, avformat_new_stream() will reallocate that array, so if av_read_frame()
>> can allocate new streams (I think AVFMT_NOHEADER formats do that) then that
>> may just crash.
> 
> This is exactly why I chose this example.

Why did you choose an example that can crash? To show why the warning in 
the documentation would be needed? Because i'm not against documenting 
details about this approach whenever used.

> 
>> You should always use ctx->streams directly.
> 
> I am using ctx->streams directly. More accurately, we should always use
> ctx->streams *immediately*.

You did not use it directly since you accessed the copy pointer you made 
two lines above instead, and you did not use that copy immediately since 
you first called av_read_frame(), which may have invalidated it.

> 
> And that's exactly the same with giving access to the internal
> structure: they use the fields *immediately*, and everything is fine.
> 
> What you are defending is equivalent to defending
> avformat_get_stream(ctx, idx) just to prevent users from accessing
> ctx->streams directly because they may keep a pointer to an outdated
> array. Note that in my proposal, the constraint is clearly documented.
> This is not the case, currently, for ctx->streams.
> 
> Furthermore, returning a pointer avoids copying the fields that the user
> will not use. It is minuscule, of course. But it is still one more
> argument for returning the pointer.
> 
> We have to choose between having an API proof against users who do not
> read the doc and try to guess what they can get away with and having an
> API that is efficient and convenient for careful users, we cannot have
> both.
> 
> I vote we choose the second, because the first is not really possible.

By returning a pointer the user has to free, they will get leaks if they 
don't read the doc. So I'm not making this function lazy-user proof, I'm 
trying to not give them a pointer to an offset within a volatile 
internal array they have no business accessing.

> 
> Regards,
>
Nicolas George March 24, 2021, 3:15 p.m. UTC | #9
James Almer (12021-03-23):
> If your creative API invention is better, then i have no problem with it
> even if it goes against existing conventions.
> 
> Which for that matter reminds me that i changed my mind regarding your
> refcounted proposal, since the alternative of adding an AVRefCount struct
> other structs can then use has proven to be kinda ugly/unwieldy, at least in
> the example implementation i saw. So there you have it.

Thank you very much for letting me know. I appreciate a lot.

I hope I will manage to convince you that we need a convenient and
efficient string API too, even if that means quite a bit of code to
implement it.

> Why did you choose an example that can crash? To show why the warning in the
> documentation would be needed? Because i'm not against documenting details
> about this approach whenever used.

My point is that we accepts in our API constructs that are somewhat
tricky to use, with constraints that are not checked by the compiler and
that the user is responsible for meeting, and that can cause crashes,
for the sake of efficiency and simplicity.

It is a good thing, I would not have it any other way.

> > I am using ctx->streams directly. More accurately, we should always use
> > ctx->streams *immediately*.
                 ^^^^^^^^^^^^^
> 
> You did not use it directly since you accessed the copy pointer you made two
> lines above instead, and you did not use that copy immediately since you
                                                     ^^^^^^^^^^^
> first called av_read_frame(), which may have invalidated it.

As I said, the important word is "immediately", not "directly". But no
matter.

> By returning a pointer the user has to free, they will get leaks if they
> don't read the doc.

Sorry, I had not understood that you were really considering returning a
dynamically allocated structure, I thought you mentioned the solution as
a bad idea.

Remember, we should always have this guiding principle in mind: Do not
use dynamic allocations if we can make it work without.

And in this case, this is not theoretical. A file with frequent
keyframes can have thousands of entries, and some applications will want
to walk the entire array, which is currently very fast. Adding a
function call in the middle will cause some overhead, but not that much.
Adding a dynamic allocation, on the other hand, would make it
sluggishly slow.

> So I'm not making this function lazy-user proof, I'm
> trying to not give them a pointer to an offset within a volatile internal
> array they have no business accessing.

But WHY? What is so bad with this:

	while ((entry = av_index_get_entry(st, idx++)))
	    do_some_math(entry->timestamp, entry->pos);
	do_some_more_math();

? No dynamic allocation, no sizeof(AVIndexEntry) creeping into the ABI,
almost as fast as direct access, almost as simple.

This is by far the simplest and most efficient solution for both you and
our users.

So why reject it?

Regards,
James Almer March 24, 2021, 7:25 p.m. UTC | #10
On 3/24/2021 12:15 PM, Nicolas George wrote:
> James Almer (12021-03-23):
>> If your creative API invention is better, then i have no problem with it
>> even if it goes against existing conventions.
>>
>> Which for that matter reminds me that i changed my mind regarding your
>> refcounted proposal, since the alternative of adding an AVRefCount struct
>> other structs can then use has proven to be kinda ugly/unwieldy, at least in
>> the example implementation i saw. So there you have it.
> 
> Thank you very much for letting me know. I appreciate a lot.
> 
> I hope I will manage to convince you that we need a convenient and
> efficient string API too, even if that means quite a bit of code to
> implement it.

I have no opinion on a strings API.

> 
>> Why did you choose an example that can crash? To show why the warning in the
>> documentation would be needed? Because i'm not against documenting details
>> about this approach whenever used.
> 
> My point is that we accepts in our API constructs that are somewhat
> tricky to use, with constraints that are not checked by the compiler and
> that the user is responsible for meeting, and that can cause crashes,
> for the sake of efficiency and simplicity.
> 
> It is a good thing, I would not have it any other way.
> 
>>> I am using ctx->streams directly. More accurately, we should always use
>>> ctx->streams *immediately*.
>                   ^^^^^^^^^^^^^
>>
>> You did not use it directly since you accessed the copy pointer you made two
>> lines above instead, and you did not use that copy immediately since you
>                                                       ^^^^^^^^^^^
>> first called av_read_frame(), which may have invalidated it.
> 
> As I said, the important word is "immediately", not "directly". But no
> matter.
> 
>> By returning a pointer the user has to free, they will get leaks if they
>> don't read the doc.
> 
> Sorry, I had not understood that you were really considering returning a
> dynamically allocated structure, I thought you mentioned the solution as
> a bad idea.

I don't recall saying that, or at least not in this thread. It's 
definitely not my preferred solution for the reasons you state below, 
which is why i didn't do it at first. But in my first reply i suggested 
it as an alternative to ensure sizeof(AVIndexEntry) would not be 
effectively tied to the ABI, and then sent a v2 of this patch 
implementing it.

> 
> Remember, we should always have this guiding principle in mind: Do not
> use dynamic allocations if we can make it work without.
> 
> And in this case, this is not theoretical. A file with frequent
> keyframes can have thousands of entries, and some applications will want
> to walk the entire array, which is currently very fast. Adding a
> function call in the middle will cause some overhead, but not that much.
> Adding a dynamic allocation, on the other hand, would make it
> sluggishly slow.
> 
>> So I'm not making this function lazy-user proof, I'm
>> trying to not give them a pointer to an offset within a volatile internal
>> array they have no business accessing.
> 
> But WHY? What is so bad with this:
> 
> 	while ((entry = av_index_get_entry(st, idx++)))
> 	    do_some_math(entry->timestamp, entry->pos);
> 	do_some_more_math();
> 
> ? No dynamic allocation, no sizeof(AVIndexEntry) creeping into the ABI,
> almost as fast as direct access, almost as simple.
> 
> This is by far the simplest and most efficient solution for both you and
> our users.
> 
> So why reject it?

Because it's not even a pointer that's guaranteed to be valid or point 
to valid data until the next call to one specific function or set of 
functions (Your example is basically av_dict_get(), where only calls to 
av_dict_set*() on a AVDictionary you own will make previously returned 
AVDictionaryEntry invalid), and instead it's a pointer that may or may 
not be valid after the next call to potentially *any* lavf function, 
because it could be modified by reading the header, reading a packet, 
and maybe more cases, all depending on demuxer, and after which it could 
still be pointing to the desired entry, or to some other entry, or to 
freed memory.

In any case, would a

int av_get_index_entry(AVStream *st, int idx, int64_t *pos,
                        int64_t *timestamp, int *size,
                        int *distance, int *flags);

implementation, which is basically the inverse of av_add_index_entry(), 
be ok instead? No AVIndexEntry usage, only the fields the caller cares 
about can be returned, and no "This pointer will autodestruct in five 
seconds" situation.

Following your example above, it would look like

while (!av_get_index_entry(st, idx++, &pos, &timestamp,
                            NULL, NULL, NULL))
     do_some_math(timestamp, pos);
do_some_more_math();
Nicolas George March 24, 2021, 7:48 p.m. UTC | #11
James Almer (12021-03-24):
> Because it's not even a pointer that's guaranteed to be valid or point to
> valid data until the next call to one specific function or set of functions
> (Your example is basically av_dict_get(), where only calls to av_dict_set*()
> on a AVDictionary you own will make previously returned AVDictionaryEntry
> invalid), and instead it's a pointer that may or may not be valid after the
> next call to potentially *any* lavf function, because it could be modified
> by reading the header, reading a packet, and maybe more cases, all depending
> on demuxer, and after which it could still be pointing to the desired entry,
> or to some other entry, or to freed memory.

What you say is true, but it is absolutely not a problem if the user
access the pointer immediately, before calling any other function, just
like it is supposed to do when derefencing ctx->streams.

> In any case, would a
> 
> int av_get_index_entry(AVStream *st, int idx, int64_t *pos,
>                        int64_t *timestamp, int *size,
>                        int *distance, int *flags);
> 
> implementation, which is basically the inverse of av_add_index_entry(), be
> ok instead? No AVIndexEntry usage, only the fields the caller cares about
> can be returned, and no "This pointer will autodestruct in five seconds"
> situation.

I will not block it as I would block the version with dynamic
allocation, but you will need a little more work to convince me that
this is better than just returning the pointer. You make more work for
yourself and more work for the user, you have to have a good reason for
it.

Regards,
James Almer March 24, 2021, 9:18 p.m. UTC | #12
On 3/24/2021 4:48 PM, Nicolas George wrote:
> James Almer (12021-03-24):
>> Because it's not even a pointer that's guaranteed to be valid or point to
>> valid data until the next call to one specific function or set of functions
>> (Your example is basically av_dict_get(), where only calls to av_dict_set*()
>> on a AVDictionary you own will make previously returned AVDictionaryEntry
>> invalid), and instead it's a pointer that may or may not be valid after the
>> next call to potentially *any* lavf function, because it could be modified
>> by reading the header, reading a packet, and maybe more cases, all depending
>> on demuxer, and after which it could still be pointing to the desired entry,
>> or to some other entry, or to freed memory.
> 
> What you say is true, but it is absolutely not a problem if the user
> access the pointer immediately, before calling any other function, just
> like it is supposed to do when derefencing ctx->streams.
> 
>> In any case, would a
>>
>> int av_get_index_entry(AVStream *st, int idx, int64_t *pos,
>>                         int64_t *timestamp, int *size,
>>                         int *distance, int *flags);
>>
>> implementation, which is basically the inverse of av_add_index_entry(), be
>> ok instead? No AVIndexEntry usage, only the fields the caller cares about
>> can be returned, and no "This pointer will autodestruct in five seconds"
>> situation.
> 
> I will not block it as I would block the version with dynamic
> allocation, but you will need a little more work to convince me that
> this is better than just returning the pointer. You make more work for
> yourself and more work for the user, you have to have a good reason for
> it.

I think it's clear by now that nothing i could say will convince you 
it's better to not return a pointer to an internal array when there are 
safer alternatives, and i already gave my reasons why, none of which 
satisfied you, so i don't see the point in keeping this discussion going.

If some other developer wants to chime in and comment which approach 
they prefer, then that would be ideal. I have no hurry to push this, and 
I'd rather know which direction to go before i write a v3, so it can wait.

> 
> Regards,
> 
> 
> _______________________________________________
> ffmpeg-devel mailing list
> ffmpeg-devel@ffmpeg.org
> https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
> 
> To unsubscribe, visit link above, or email
> ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".
>
Nicolas George March 25, 2021, 11:55 a.m. UTC | #13
James Almer (12021-03-24):
> I think it's clear by now that nothing i could say will convince you it's
> better to not return a pointer to an internal array when there are safer
> alternatives, and i already gave my reasons why, none of which satisfied
> you, so i don't see the point in keeping this discussion going.

I find this comment quite offensive. You did not manage to convince me
because your arguments have not been up to the task. Do not try to push
the fault on me, and I will refrain from accusing you of not taking my
arguments into account. Coming to an agreement is a process, it requires
both parts to refine their arguments progressively.

This is a matter of choosing the least of several drawbacks. So let us
compare the drawbacks and not muddle things further.

For me:

1. having a dynamic allocation is way way worse than
2. having sizeof(AVIndexEntry) in the ABI, which is somewhat worse than
3. having a function with many arguments, which is a tiny bit worse than
4. having a "use this pointer immediately" constraint.

We agree except on 3>4, so let us focus on that.

Option (3) has these practical drawbacks: Many arguments involves more
typing and the risk of messing with the order and getting invalid
values. It also requires redesigning the API if we add fields and
exporting them is useful. And it requires either the overhead of NULL
checks or the caller declaring unneeded variables.

Option (4) has the obvious practical drawback that misusing the API
causes undefined behavior.

The constraint of using a pointer immediately on risk of undefined
behavior is actually a frequent one, in FFmpeg but also in C at large:
gethosbtyname, localtime, etc.

For me, that makes it approximately on par with the risk of messing the
order of the many arguments.

Which leaves more typing, NULL checks overhead or useless variables
(still more typing).

It is tiny, I have no trouble admitting, but it is tiny in favor of one
solution.

If you do not agree with these estimates, please explain exactly where.

> If some other developer wants to chime in and comment which approach they
> prefer, then that would be ideal.

Indeed.

Regards,
Marvin Scholz March 25, 2021, 1:21 p.m. UTC | #14
On 25 Mar 2021, at 12:55, Nicolas George wrote:

> James Almer (12021-03-24):
>> I think it's clear by now that nothing i could say will convince you it's
>> better to not return a pointer to an internal array when there are safer
>> alternatives, and i already gave my reasons why, none of which satisfied
>> you, so i don't see the point in keeping this discussion going.
>
> I find this comment quite offensive. You did not manage to convince me
> because your arguments have not been up to the task. Do not try to push
> the fault on me, and I will refrain from accusing you of not taking my
> arguments into account. Coming to an agreement is a process, it requires
> both parts to refine their arguments progressively.
>
> This is a matter of choosing the least of several drawbacks. So let us
> compare the drawbacks and not muddle things further.
>
> For me:
>
> 1. having a dynamic allocation is way way worse than
> 2. having sizeof(AVIndexEntry) in the ABI, which is somewhat worse than
> 3. having a function with many arguments, which is a tiny bit worse than
> 4. having a "use this pointer immediately" constraint.
>
> We agree except on 3>4, so let us focus on that.
>
> Option (3) has these practical drawbacks: Many arguments involves more
> typing and the risk of messing with the order and getting invalid
> values. It also requires redesigning the API if we add fields and
> exporting them is useful. And it requires either the overhead of NULL
> checks or the caller declaring unneeded variables.
>
> Option (4) has the obvious practical drawback that misusing the API
> causes undefined behavior.
>
> The constraint of using a pointer immediately on risk of undefined
> behavior is actually a frequent one, in FFmpeg but also in C at large:
> gethosbtyname, localtime, etc.
>
> For me, that makes it approximately on par with the risk of messing the
> order of the many arguments.
>
> Which leaves more typing, NULL checks overhead or useless variables
> (still more typing).
>
> It is tiny, I have no trouble admitting, but it is tiny in favor of one
> solution.
>
> If you do not agree with these estimates, please explain exactly where.

I disagree with your ordering too,
for me 4. is clearly worse than 3. here, as the harm that can be
done by mixing up arguments (in this case) is less than use of a
possibly invalid pointer. And mixed up arguments would probably be
noticed easier when testing than reads of possibly invalid memory.

Even when documenting the constraint of immediately using the pointer,
it could easily be overlooked/forgotten.
It does not even has to be me forgetting the constraint, but could be
someone else changing the code in the future, adding some API call
before the pointer is used, unaware of the possible consequences and
that could invalidate the pointer (if I understand James explanation
correctly). This could go unnoticed easily.

So IMO having a function with many arguments seems like a better and
safer tradeoff in this case to me…

>
>> If some other developer wants to chime in and comment which approach they
>> prefer, then that would be ideal.
>
> Indeed.
>
> Regards,
>
> -- 
>   Nicolas George
> _______________________________________________
> ffmpeg-devel mailing list
> ffmpeg-devel@ffmpeg.org
> https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
>
> To unsubscribe, visit link above, or email
> ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".
James Almer March 25, 2021, 1:37 p.m. UTC | #15
On 3/25/2021 8:55 AM, Nicolas George wrote:
> James Almer (12021-03-24):
>> I think it's clear by now that nothing i could say will convince you it's
>> better to not return a pointer to an internal array when there are safer
>> alternatives, and i already gave my reasons why, none of which satisfied
>> you, so i don't see the point in keeping this discussion going.
> 
> I find this comment quite offensive. You did not manage to convince me
> because your arguments have not been up to the task. Do not try to push
> the fault on me, and I will refrain from accusing you of not taking my
> arguments into account. Coming to an agreement is a process, it requires
> both parts to refine their arguments progressively.

It was not meant to be offensive. As i said above i have no other 
argument against your approach to this function than what i already 
said, hence why anything i might add will be more or less repeating myself.

> 
> This is a matter of choosing the least of several drawbacks. So let us
> compare the drawbacks and not muddle things further.
> 
> For me:
> 
> 1. having a dynamic allocation is way way worse than
> 2. having sizeof(AVIndexEntry) in the ABI, which is somewhat worse than
> 3. having a function with many arguments, which is a tiny bit worse than
> 4. having a "use this pointer immediately" constraint.
> 
> We agree except on 3>4, so let us focus on that.
> 
> Option (3) has these practical drawbacks: Many arguments involves more
> typing and the risk of messing with the order and getting invalid
> values. It also requires redesigning the API if we add fields and
> exporting them is useful. And it requires either the overhead of NULL
> checks or the caller declaring unneeded variables.

Same situation as av_add_index_entry(). And if you look at the signature 
of the (3) function in my last proposal, i kept the av_index_* namespace 
free precisely in case a new API in the future needs to be added for 
this reason, for both add() and get(). One that could work directly with 
the AVIndexEntry struct after the internal entries array was redesigned, 
perhaps using AVTreeNode, so returned pointers or entries are safer to 
handle.

> 
> Option (4) has the obvious practical drawback that misusing the API
> causes undefined behavior.
> 
> The constraint of using a pointer immediately on risk of undefined
> behavior is actually a frequent one, in FFmpeg but also in C at large:
> gethosbtyname, localtime, etc.
> 
> For me, that makes it approximately on par with the risk of messing the
> order of the many arguments.
> 
> Which leaves more typing, NULL checks overhead or useless variables
> (still more typing).
> 
> It is tiny, I have no trouble admitting, but it is tiny in favor of one
> solution.
> 
> If you do not agree with these estimates, please explain exactly where.

I don't know if you remember how in this one imgutils patch i sent some 
time ago i was against functions with tons of output parameters, because 
i considered it ugly and typical of enterprise software API. That hasn't 
changed. But here, being the exact counterpart of an existing add() 
function put it above the other approach i dislike slightly more of 
returning an internal pointer and not being able to tell the user just 
what may invalidate it.

> 
>> If some other developer wants to chime in and comment which approach they
>> prefer, then that would be ideal.
> 
> Indeed.
> 
> Regards,
> 
> 
> _______________________________________________
> ffmpeg-devel mailing list
> ffmpeg-devel@ffmpeg.org
> https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
> 
> To unsubscribe, visit link above, or email
> ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".
>
zsugabubus March 25, 2021, 8:28 p.m. UTC | #16
On Thu, Mar 25, 2021 at 02:21:43PM +0100, Marvin Scholz wrote:
> On 25 Mar 2021, at 12:55, Nicolas George wrote:
> > James Almer (12021-03-24):
> >> I think it's clear by now that nothing i could say will convince you it's
> >> better to not return a pointer to an internal array when there are safer
> >> alternatives, and i already gave my reasons why, none of which satisfied
> >> you, so i don't see the point in keeping this discussion going.
> >
> > I find this comment quite offensive. You did not manage to convince me
> > because your arguments have not been up to the task. Do not try to push
> > the fault on me, and I will refrain from accusing you of not taking my
> > arguments into account. Coming to an agreement is a process, it requires
> > both parts to refine their arguments progressively.
> >
> > This is a matter of choosing the least of several drawbacks. So let us
> > compare the drawbacks and not muddle things further.
> >
> > For me:
> >
> > 1. having a dynamic allocation is way way worse than
> > 2. having sizeof(AVIndexEntry) in the ABI, which is somewhat worse than
> > 3. having a function with many arguments, which is a tiny bit worse than
> > 4. having a "use this pointer immediately" constraint.
> >
> > We agree except on 3>4, so let us focus on that.
> >
> > Option (3) has these practical drawbacks: Many arguments involves more
> > typing and the risk of messing with the order and getting invalid
> > values. It also requires redesigning the API if we add fields and
> > exporting them is useful. And it requires either the overhead of NULL
> > checks or the caller declaring unneeded variables.
> >
> > Option (4) has the obvious practical drawback that misusing the API
> > causes undefined behavior.
> >
> > The constraint of using a pointer immediately on risk of undefined
> > behavior is actually a frequent one, in FFmpeg but also in C at large:
> > gethosbtyname, localtime, etc.
> >
> > For me, that makes it approximately on par with the risk of messing the
> > order of the many arguments.
> >
> > Which leaves more typing, NULL checks overhead or useless variables
> > (still more typing).
> >
> > It is tiny, I have no trouble admitting, but it is tiny in favor of one
> > solution.
> >
> > If you do not agree with these estimates, please explain exactly where.
> 
> I disagree with your ordering too,
> for me 4. is clearly worse than 3. here, as the harm that can be
> done by mixing up arguments (in this case) is less than use of a
> possibly invalid pointer.

If you mess the order of the arguments in an unlucky way or forgot to
initialize one, that's also undefined behavior--though you may get
warnings.

> And mixed up arguments would probably be noticed easier when testing
> than reads of possibly invalid memory.

Every operation is potentially invalid unless you make sure that you
keep the constraints given by the API. That's the end of the story.

Returning to the previous streams example, you also have to make sure,
you do not call avformat_free_context() somewhere before accessing
AVStream, because that would invalidate that. You can do it easily
without getting any warnings or errors. Why nobody complains about this?
(Because we are not in Rust but in C.)

> Even when documenting the constraint of immediately using the pointer,
> it could easily be overlooked/forgotten.

Just like everything else.

> It does not even has to be me forgetting the constraint, but could be
> someone else changing the code in the future, adding some API call
> before the pointer is used, unaware of the possible consequences and
> that could invalidate the pointer (if I understand James explanation
> correctly). This could go unnoticed easily.

Keep away the cat from the code and sit somebody to the chair who used
to read the documentation and familiar with the code he modifies.

Anyway, if you modify index entries while iterating, then you may miss
some entries, or count one twice, who knows. So it is the violation of
the API anyways, and you will be happy when you receive an invalid
memory access or a fatal av_free() error that will save you days of
debugging.

> So IMO having a function with many arguments seems like a better and
> safer tradeoff in this case to me…

Safe-safe... we are still in C, homeland of UB. Your program will not
get magically more or less safer depending on what a single FFmpeg API
function returns, if you or your colleague (or >me< or >anybody<) never
used the standard library before and/or does not read the documentation.


Let's investigate a different thing that have not been mentioned before:
What if AVIndexEntry changes?

Fields ADDED/User INTERESTED:
  (3): Must use the new av_get_index_entry2(); have to be very careful
       about the order of parameters.
  (4): AVIndexEntry.new_field.

Fields ADDED/User NOT interested:
  (3): Deprecation warning for av_get_index_entry(); will have to use
       function with the new signature, passing NULL, taking care of the
       order of arguments.
  (4): -

Fields REMOVED/User was INTERESTED:
  (3): Potentially deprecation warning for av_get_index_entry?(); have
       to use function with the new signature, taking care of the order
       of arguments.
  (4): Deprecation warning for the field; have to use a different field
       or something. ((If field access would be hidden behind a function
       call deprecation may not be even necessary in all cases.))

Fields REMOVED/User not INTERESTED:
  (3): Potentially deprecation warning for av_get_index_entry?(); have
       to use function with the new signature, taking care of the order
       of arguments, again.
  (4): -

Old (3) methods surely cause API bloat.

Changes in fundamental fields require major changes in all interfaces
since old values may not be easily substituted by anything other. In
every other case (4) seems a better choice.

So IMO const AVIndexEntry *av_get_index_entry(AVStream, unsigned) has
the advantage that:
- Simple: Implementation is one line, no branches, no memcpy, no
  AVERROR()s that you 100% not interested (for this function) and can be
  more easily maintained.
- Superset of the "many arguments" proposal: If user wants some fields
  to live longer, it can copy it... without branches.
- Returned object can be extended with functions. (Computed values.)


At the end, if you take some random pointer and you expect it too stay
valid for some unspecified time, yes, you may face some unexpected
result.

There are objects that stay usable until the corresponding free() call
and there are objects that stay usable until some other condition. You
have to look up the API documentation for the function signature,
reading what it does, does this function really does what I need and
lastly for the correct usage.

For me, (4) seems much simpler and simpler things are always a better
choice in the long run.

--
zsugabubus

P.S.: It probably will not be an issue with AVIndexEntry, but a complex
structure with pointers would be even more hassle using (3), because
output variables would require special memdup()ing and free()ing. Going
down the rabbit hole, how many objects should be created at the end? I
think this approach is not viable in big so why should it be used in
small.
Anton Khirnov April 1, 2021, 10:07 a.m. UTC | #17
Quoting James Almer (2021-03-25 14:37:20)
> On 3/25/2021 8:55 AM, Nicolas George wrote:
> 
> Same situation as av_add_index_entry(). And if you look at the signature 
> of the (3) function in my last proposal, i kept the av_index_* namespace 
> free precisely in case a new API in the future needs to be added for 
> this reason, for both add() and get(). One that could work directly with 
> the AVIndexEntry struct after the internal entries array was redesigned, 
> perhaps using AVTreeNode, so returned pointers or entries are safer to 
> handle.

On the topic of av_add_index_entry() - is there any reason that function
should be public? Seems like it's internal-use only.

> 
> > 
> > Option (4) has the obvious practical drawback that misusing the API
> > causes undefined behavior.
> > 
> > The constraint of using a pointer immediately on risk of undefined
> > behavior is actually a frequent one, in FFmpeg but also in C at large:
> > gethosbtyname, localtime, etc.
> > 
> > For me, that makes it approximately on par with the risk of messing the
> > order of the many arguments.
> > 
> > Which leaves more typing, NULL checks overhead or useless variables
> > (still more typing).
> > 
> > It is tiny, I have no trouble admitting, but it is tiny in favor of one
> > solution.
> > 
> > If you do not agree with these estimates, please explain exactly where.
> 
> I don't know if you remember how in this one imgutils patch i sent some 
> time ago i was against functions with tons of output parameters, because 
> i considered it ugly and typical of enterprise software API. That hasn't 
> changed. But here, being the exact counterpart of an existing add() 
> function put it above the other approach i dislike slightly more of 
> returning an internal pointer and not being able to tell the user just 
> what may invalidate it.
> 
> > 
> >> If some other developer wants to chime in and comment which approach they
> >> prefer, then that would be ideal.

FWIW in this specific case exporting a short-lived pointer seems less
problematic than the other options.

But on the other hand I wonder about exporting AVIndexEntry exactly as
is internally. Are all these fields useful or even meaningful to
external callers?
Perhaps we could make a new struct that would export only those fields
people actually use. And have the new API return a pointer to something
like AVFormatInternal.index_entry_for_the_caller.

Also a naming note - I'd prefer the function names to start with
avformat, so it's clearer where they belong. "index" can mean many
different things.
James Almer April 1, 2021, 1:19 p.m. UTC | #18
On 4/1/2021 7:07 AM, Anton Khirnov wrote:
> Quoting James Almer (2021-03-25 14:37:20)
>> On 3/25/2021 8:55 AM, Nicolas George wrote:
>>
>> Same situation as av_add_index_entry(). And if you look at the signature
>> of the (3) function in my last proposal, i kept the av_index_* namespace
>> free precisely in case a new API in the future needs to be added for
>> this reason, for both add() and get(). One that could work directly with
>> the AVIndexEntry struct after the internal entries array was redesigned,
>> perhaps using AVTreeNode, so returned pointers or entries are safer to
>> handle.
> 
> On the topic of av_add_index_entry() - is there any reason that function
> should be public? Seems like it's internal-use only.

We could deprecate it and make it internal, if it's not used by callers.
It may have been added back when things were devised as being more user 
controlled, which ultimately didn't happen, or just made public without 
giving it much thought.

> 
>>
>>>
>>> Option (4) has the obvious practical drawback that misusing the API
>>> causes undefined behavior.
>>>
>>> The constraint of using a pointer immediately on risk of undefined
>>> behavior is actually a frequent one, in FFmpeg but also in C at large:
>>> gethosbtyname, localtime, etc.
>>>
>>> For me, that makes it approximately on par with the risk of messing the
>>> order of the many arguments.
>>>
>>> Which leaves more typing, NULL checks overhead or useless variables
>>> (still more typing).
>>>
>>> It is tiny, I have no trouble admitting, but it is tiny in favor of one
>>> solution.
>>>
>>> If you do not agree with these estimates, please explain exactly where.
>>
>> I don't know if you remember how in this one imgutils patch i sent some
>> time ago i was against functions with tons of output parameters, because
>> i considered it ugly and typical of enterprise software API. That hasn't
>> changed. But here, being the exact counterpart of an existing add()
>> function put it above the other approach i dislike slightly more of
>> returning an internal pointer and not being able to tell the user just
>> what may invalidate it.
>>
>>>
>>>> If some other developer wants to chime in and comment which approach they
>>>> prefer, then that would be ideal.
> 
> FWIW in this specific case exporting a short-lived pointer seems less
> problematic than the other options.
> 
> But on the other hand I wonder about exporting AVIndexEntry exactly as
> is internally. Are all these fields useful or even meaningful to
> external callers?
> Perhaps we could make a new struct that would export only those fields
> people actually use. And have the new API return a pointer to something
> like AVFormatInternal.index_entry_for_the_caller.

There are five fields, pos, timestamp, flags, size, and min_distance. 
The first three are the most important, with timestamps and flags IMO 
being the only useful ones for users building their own index entry list 
for example to display each seek point to the user in a player's seek bar.
Pos doesn't seem like it would be useful outside of the internal seeking 
methods, size seems to be the least useful of them all, and min_distance 
i can't say, since it probably is mainly useful to also for the internal 
seeking methods.

So instead of making a new struct, or build a second array internally, 
we could maybe just return those two fields?

> 
> Also a naming note - I'd prefer the function names to start with
> avformat, so it's clearer where they belong. "index" can mean many
> different things.
Nicolas George April 1, 2021, 2:29 p.m. UTC | #19
James Almer (12021-04-01):
> Pos doesn't seem like it would be useful outside of the internal seeking
> methods, size seems to be the least useful of them all

pos and size seem very useful to me for an application that wants to
make quick graphs of (time, bitrate).

As a general rule "foobar does not seem useful for users" is usually a
lack of imagination from the developer.

> So instead of making a new struct, or build a second array internally, we
> could maybe just return those two fields?

Why do you want to complicate things? Just return the darn pointer and
trust the users to use it properly to access the fields they need,
without speculating.

Regards,
James Almer April 1, 2021, 2:49 p.m. UTC | #20
On 4/1/2021 11:29 AM, Nicolas George wrote:
> James Almer (12021-04-01):
>> Pos doesn't seem like it would be useful outside of the internal seeking
>> methods, size seems to be the least useful of them all
> 
> pos and size seem very useful to me for an application that wants to
> make quick graphs of (time, bitrate).
> 
> As a general rule "foobar does not seem useful for users" is usually a
> lack of imagination from the developer.

Yet the line has to be drawn somewhere. Otherwise public headers would 
be chock full of structs and/or fields normally kept in internal headers 
because "Someone may find them useful".

Look at adts_parser, av_adts_header_parse() returns only two fields from 
AACADTSHeaderInfo. Could the rest be useful for some users? Probably, 
people can surely come up with all kinds of scenarios where such fields 
could be useful, like you and i did for AVIndexEntry ones, but in this 
case a decision ultimately was made that only samples and frames would 
be made available.

> 
>> So instead of making a new struct, or build a second array internally, we
>> could maybe just return those two fields?
> 
> Why do you want to complicate things? Just return the darn pointer and
> trust the users to use it properly to access the fields they need,
> without speculating.

Please refrain from making such accusations. I want to not return a 
volatile pointer if i can avoid it, because no matter how many warnings 
you add or how much you trust the user, it's an unsafe practice.
But if both you and Anton consider it the less problematic solution, 
then fine, lets go with it.
Nicolas George April 6, 2021, 2:19 p.m. UTC | #21
James Almer (12021-04-01):
> Yet the line has to be drawn somewhere. Otherwise public headers would be
> chock full of structs and/or fields normally kept in internal headers
> because "Someone may find them useful".

You are right, the line has to be drawn somewhere. I posit that the best
place to draw the line is defined by laziness: let us make things
private by default; but if something needs to be public, then let us
make it public in the simplest possible way. We need to think ahead
about what would be a maintenance or compatibility hassle, but apart
from that, we should not spend time deciding which bit needs to be
public and which bit needs to be private. We just chose the simplest
solution.

> Please refrain from making such accusations.

I insist: the direct consequence of what you want is to complicate
things, therefore I consider it is fair to write that in this particular
instance "you want to complicate things".

But more importantly, it is not an accusation: I state it only in the
hope that it will be useful, to you and to the project.

Still in the hope to be useful, I will elaborate. I believe I have
observed a bias in your policy towards the project. Knowing that it
might just be imagination plus confirmation bias from me, I still
believe it is useful to let you know about it. It is easier to spot
biases in others than in oneself.

The bias I believe you have is to underestimate mundane complexity: the
cost of typing a few more keys (or to have to rely on advanced features
of an editor); the cost of reading longer lines of code; the cost of
remembering the order of the arguments of a function; the cost of
remembering in which half of a structure that has been split a
particular field has arrived, etc.

All this is tiny, but it is everywhere, and tiny times many, it starts
to accumulate, and it makes working on the project that less fun.

This is not an accusation, I just suggest that you think about it.

> I want to not return a volatile pointer if i can avoid it, because no
> matter how many warnings you add or how much you trust the user, it's
> an unsafe practice. But if both you and Anton consider it the less
> problematic solution, then fine, lets go with it.

I will pile up on this:

In C every function that accepts a pointer as argument or that returns a
pointer comes with caveats about the validity of the pointer.

Just a few common cases: some functions will free their arguments with
generic free(); some functions will free them with a specific handler;
some functions will keep the pointer and expect it to stay valid; some
functions will demand that you free what they return, again, with
generic or specific handlers. The list goes on and on and on.

There are no exception: when you meet a new function with pointers, you
cannot guess the hypotheses it makes about the pointers (unless it is
very similar to a function you already know, and even then you can have
bad surprises), you have to look at the doc. And if you do not, you will
get segfaults or memory leaks or memory corruption.

And every C programmer knows it. Possibly not in as many words, but what
I wrote just now will be no surprise for any C programmer skilled enough
to use the FFmpeg libraries.

The constraint "only valid until the next function call on the same
structure" is not the most common, nor the least dangerous, but it is
not exceptional at all. And in this case, it is by far both the most
simple and the most efficient solution.

Regards,
diff mbox series

Patch

diff --git a/libavformat/avformat.h b/libavformat/avformat.h
index 765bc3b6f5..ac8b57149e 100644
--- a/libavformat/avformat.h
+++ b/libavformat/avformat.h
@@ -2754,6 +2754,45 @@  int av_find_default_stream_index(AVFormatContext *s);
  */
 int av_index_search_timestamp(AVStream *st, int64_t timestamp, int flags);
 
+/**
+ * Get the index entry count for the given AVStream.
+ *
+ * @param st stream
+ * @return the number of index entries in the stream
+ */
+int av_index_get_entries_count(AVStream *st);
+
+/**
+ * Get the AVIndexEntry corresponding to the given index.
+ *
+ * @param st          stream containing the requested AVIndexEntry
+ * @param index_entry pointer to an AVIndexEntry where to store the entry. On failure,
+ *                    or if the index entry could not be found, this struct will remain
+ *                    untouched
+ * @param idx         the desired index
+ * @return >= 0 on success. AVERROR(ENOENT) if no entry for the requested index could
+           be found. other negative AVERROR codes on failure.
+ */
+int av_index_get_entry(AVStream *st, AVIndexEntry *index_entry, int idx);
+
+/**
+ * Get the AVIndexEntry corresponding to the given timestamp.
+ *
+ * @param st          stream containing the requested AVIndexEntry
+ * @param index_entry pointer to an AVIndexEntry where to store the entry. On failure,
+ *                    or if the index entry could not be found, this struct will remain
+ *                    untouched
+ * @param timestamp   timestamp to retrieve the index entry for
+ * @param flags       if AVSEEK_FLAG_BACKWARD then the returned entry will correspond
+ *                    to the timestamp which is <= the requested one, if backward
+ *                    is 0, then it will be >=
+ *                    if AVSEEK_FLAG_ANY seek to any frame, only keyframes otherwise
+ * @return >= 0 on success. AVERROR(ENOENT) if no entry for the requested timestamp could
+ *          be found. other negative AVERROR codes on failure.
+ */
+int av_index_get_entry_from_timestamp(AVStream *st, AVIndexEntry *index_entry,
+                                      int64_t wanted_timestamp, int flags);
+
 /**
  * Add an index entry into a sorted list. Update the entry if the list
  * already contains it.
diff --git a/libavformat/utils.c b/libavformat/utils.c
index f31826f2ea..4e7a8bf23e 100644
--- a/libavformat/utils.c
+++ b/libavformat/utils.c
@@ -2129,6 +2129,38 @@  int av_index_search_timestamp(AVStream *st, int64_t wanted_timestamp, int flags)
                                      wanted_timestamp, flags);
 }
 
+int av_index_get_entries_count(AVStream *st)
+{
+    return st->internal->nb_index_entries;
+}
+
+int av_index_get_entry(AVStream *st, AVIndexEntry *index_entry, int idx)
+{
+    if (idx < 0)
+        return AVERROR(EINVAL);
+    if (idx >= st->internal->nb_index_entries)
+        return AVERROR(ENOENT);
+
+    memcpy(index_entry, &st->internal->index_entries[idx], sizeof(*index_entry));
+
+    return 0;
+}
+
+int av_index_get_entry_from_timestamp(AVStream *st, AVIndexEntry *index_entry,
+                                      int64_t wanted_timestamp, int flags)
+{
+    int idx = ff_index_search_timestamp(st->internal->index_entries,
+                                        st->internal->nb_index_entries,
+                                        wanted_timestamp, flags);
+
+    if (idx < 0)
+        return AVERROR(ENOENT);
+
+    memcpy(index_entry, &st->internal->index_entries[idx], sizeof(*index_entry));
+
+    return 0;
+}
+
 static int64_t ff_read_timestamp(AVFormatContext *s, int stream_index, int64_t *ppos, int64_t pos_limit,
                                  int64_t (*read_timestamp)(struct AVFormatContext *, int , int64_t *, int64_t ))
 {