diff mbox

[FFmpeg-devel,05/11] avcodec/h264_parser: Reuse the RBSP buffer

Message ID 20190816030531.4775-5-andreas.rheinhardt@gmail.com
State New
Headers show

Commit Message

Andreas Rheinhardt Aug. 16, 2019, 3:05 a.m. UTC
Up until now, the H.264 parser has allocated a new buffer for every
frame in order to store the unescaped RBSP in case the part of the RBSP
that will be unescaped contains 0x03 escape bytes. This is expensive
and this commit changes this: The buffer is now kept and reused.

For an AVC file with an average framesize of 15304 B (without in-band
parameter sets etc.) and 132044 frames (131072 of which ended up in the
results) this reduced the average time used by the H.264 parser from
87708 decicycles (excluding about 1100-1200 skips in each iteration)
to 16963 decicycles (excluding about 10-15 skips in each iteration).
The test has been iterated 10 times.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@gmail.com>
---
 libavcodec/h264_parser.c | 15 ++++++++-------
 1 file changed, 8 insertions(+), 7 deletions(-)

Comments

Kieran Kunhya Aug. 16, 2019, 4:49 a.m. UTC | #1
On Fri, 16 Aug 2019 at 04:20, Andreas Rheinhardt <
andreas.rheinhardt@gmail.com> wrote:

> Up until now, the H.264 parser has allocated a new buffer for every
> frame in order to store the unescaped RBSP in case the part of the RBSP
> that will be unescaped contains 0x03 escape bytes. This is expensive
> and this commit changes this: The buffer is now kept and reused.
>
> For an AVC file with an average framesize of 15304 B (without in-band
> parameter sets etc.) and 132044 frames (131072 of which ended up in the
> results) this reduced the average time used by the H.264 parser from
> 87708 decicycles (excluding about 1100-1200 skips in each iteration)
> to 16963 decicycles (excluding about 10-15 skips in each iteration).
> The test has been iterated 10 times.
>

If I understand correctly, this patch means if you have a large frame (or
large NAL) you are stuck with the large allocation of memory until the
decoder is closed.
Not sure if you have read the discussion here
https://patchwork.ffmpeg.org/patch/5631/

Kieran
Andreas Rheinhardt Aug. 16, 2019, 5:07 a.m. UTC | #2
Kieran Kunhya:
> On Fri, 16 Aug 2019 at 04:20, Andreas Rheinhardt <
> andreas.rheinhardt@gmail.com> wrote:
> 
>> Up until now, the H.264 parser has allocated a new buffer for every
>> frame in order to store the unescaped RBSP in case the part of the RBSP
>> that will be unescaped contains 0x03 escape bytes. This is expensive
>> and this commit changes this: The buffer is now kept and reused.
>>
>> For an AVC file with an average framesize of 15304 B (without in-band
>> parameter sets etc.) and 132044 frames (131072 of which ended up in the
>> results) this reduced the average time used by the H.264 parser from
>> 87708 decicycles (excluding about 1100-1200 skips in each iteration)
>> to 16963 decicycles (excluding about 10-15 skips in each iteration).
>> The test has been iterated 10 times.
>>
> 
> If I understand correctly, this patch means if you have a large frame (or
> large NAL) you are stuck with the large allocation of memory until the
> decoder is closed.
> Not sure if you have read the discussion here
> https://patchwork.ffmpeg.org/patch/5631/
> 
> Kieran
> 
I am aware of said discussion; and also of your solution [1] to it. It
actually does exactly the same as I propose for the parser: It keeps a
single buffer that does not shrink. Given that this is used for the
decoders (and for cbs_h2645), I didn't deem this to be problematic.
(There is clearly no quadratic memory usage and what Derek warns about
(with huge NALs at specific positions) is a no issue for both.)

- Andreas

[1]: https://ffmpeg.org/pipermail/ffmpeg-devel/2017-November/219100.html
Kieran Kunhya Aug. 16, 2019, 2:06 p.m. UTC | #3
On Fri, 16 Aug 2019 at 06:08, Andreas Rheinhardt <
andreas.rheinhardt@gmail.com> wrote:

> Kieran Kunhya:
> > On Fri, 16 Aug 2019 at 04:20, Andreas Rheinhardt <
> > andreas.rheinhardt@gmail.com> wrote:
> >
> >> Up until now, the H.264 parser has allocated a new buffer for every
> >> frame in order to store the unescaped RBSP in case the part of the RBSP
> >> that will be unescaped contains 0x03 escape bytes. This is expensive
> >> and this commit changes this: The buffer is now kept and reused.
> >>
> >> For an AVC file with an average framesize of 15304 B (without in-band
> >> parameter sets etc.) and 132044 frames (131072 of which ended up in the
> >> results) this reduced the average time used by the H.264 parser from
> >> 87708 decicycles (excluding about 1100-1200 skips in each iteration)
> >> to 16963 decicycles (excluding about 10-15 skips in each iteration).
> >> The test has been iterated 10 times.
> >>
> >
> > If I understand correctly, this patch means if you have a large frame (or
> > large NAL) you are stuck with the large allocation of memory until the
> > decoder is closed.
> > Not sure if you have read the discussion here
> > https://patchwork.ffmpeg.org/patch/5631/
> >
> > Kieran
> >
> I am aware of said discussion; and also of your solution [1] to it. It
> actually does exactly the same as I propose for the parser: It keeps a
> single buffer that does not shrink. Given that this is used for the
> decoders (and for cbs_h2645), I didn't deem this to be problematic.
> (There is clearly no quadratic memory usage and what Derek warns about
> (with huge NALs at specific positions) is a no issue for both.)
>
> - Andreas
>
> [1]: https://ffmpeg.org/pipermail/ffmpeg-devel/2017-November/219100.html


My solution frees the buffer when it's done. With yours it stays around as
a large buffer essentially forever.

Kieran
Andreas Rheinhardt Aug. 16, 2019, 3:22 p.m. UTC | #4
Kieran Kunhya:
> On Fri, 16 Aug 2019 at 06:08, Andreas Rheinhardt <
> andreas.rheinhardt@gmail.com> wrote:
> 
>> Kieran Kunhya:
>>> On Fri, 16 Aug 2019 at 04:20, Andreas Rheinhardt <
>>> andreas.rheinhardt@gmail.com> wrote:
>>>
>>>> Up until now, the H.264 parser has allocated a new buffer for every
>>>> frame in order to store the unescaped RBSP in case the part of the RBSP
>>>> that will be unescaped contains 0x03 escape bytes. This is expensive
>>>> and this commit changes this: The buffer is now kept and reused.
>>>>
>>>> For an AVC file with an average framesize of 15304 B (without in-band
>>>> parameter sets etc.) and 132044 frames (131072 of which ended up in the
>>>> results) this reduced the average time used by the H.264 parser from
>>>> 87708 decicycles (excluding about 1100-1200 skips in each iteration)
>>>> to 16963 decicycles (excluding about 10-15 skips in each iteration).
>>>> The test has been iterated 10 times.
>>>>
>>>
>>> If I understand correctly, this patch means if you have a large frame (or
>>> large NAL) you are stuck with the large allocation of memory until the
>>> decoder is closed.
>>> Not sure if you have read the discussion here
>>> https://patchwork.ffmpeg.org/patch/5631/
>>>
>>> Kieran
>>>
>> I am aware of said discussion; and also of your solution [1] to it. It
>> actually does exactly the same as I propose for the parser: It keeps a
>> single buffer that does not shrink. Given that this is used for the
>> decoders (and for cbs_h2645), I didn't deem this to be problematic.
>> (There is clearly no quadratic memory usage and what Derek warns about
>> (with huge NALs at specific positions) is a no issue for both.)
>>
>> - Andreas
>>
>> [1]: https://ffmpeg.org/pipermail/ffmpeg-devel/2017-November/219100.html
> 
> 
> My solution frees the buffer when it's done. With yours it stays around as
> a large buffer essentially forever.
> 
> Kieran
> 
Your solution frees the buffer in the parser when it's done, but the
buffers for the decoders are kept (and only grow) during decoding. So
this commit merely does for the parser what is already done for the
deocder.
Anyway, it would be easy to add a check to the parser to free the
buffer if it is considered excessively large. I don't know how easy it
would be to add such precautions to the decoder, though. I am also not
sure what should be considered excessively large? 5 MB? 10 MB? Setting
it so high that even the highest profiles can't hit it is impossible,
because for certain profiles ((Progressive) High 10, High 4:2:2, ...)
no limit is imposed at all (apart from that, such a limit would surely
be too high).

- Andreas
Andreas Rheinhardt May 29, 2020, 3:52 p.m. UTC | #5
Andreas Rheinhardt:
> Kieran Kunhya:
>> On Fri, 16 Aug 2019 at 06:08, Andreas Rheinhardt <
>> andreas.rheinhardt@gmail.com> wrote:
>>
>>> Kieran Kunhya:
>>>> On Fri, 16 Aug 2019 at 04:20, Andreas Rheinhardt <
>>>> andreas.rheinhardt@gmail.com> wrote:
>>>>
>>>>> Up until now, the H.264 parser has allocated a new buffer for every
>>>>> frame in order to store the unescaped RBSP in case the part of the RBSP
>>>>> that will be unescaped contains 0x03 escape bytes. This is expensive
>>>>> and this commit changes this: The buffer is now kept and reused.
>>>>>
>>>>> For an AVC file with an average framesize of 15304 B (without in-band
>>>>> parameter sets etc.) and 132044 frames (131072 of which ended up in the
>>>>> results) this reduced the average time used by the H.264 parser from
>>>>> 87708 decicycles (excluding about 1100-1200 skips in each iteration)
>>>>> to 16963 decicycles (excluding about 10-15 skips in each iteration).
>>>>> The test has been iterated 10 times.
>>>>>
>>>>
>>>> If I understand correctly, this patch means if you have a large frame (or
>>>> large NAL) you are stuck with the large allocation of memory until the
>>>> decoder is closed.
>>>> Not sure if you have read the discussion here
>>>> https://patchwork.ffmpeg.org/patch/5631/
>>>>
>>>> Kieran
>>>>
>>> I am aware of said discussion; and also of your solution [1] to it. It
>>> actually does exactly the same as I propose for the parser: It keeps a
>>> single buffer that does not shrink. Given that this is used for the
>>> decoders (and for cbs_h2645), I didn't deem this to be problematic.
>>> (There is clearly no quadratic memory usage and what Derek warns about
>>> (with huge NALs at specific positions) is a no issue for both.)
>>>
>>> - Andreas
>>>
>>> [1]: https://ffmpeg.org/pipermail/ffmpeg-devel/2017-November/219100.html
>>
>>
>> My solution frees the buffer when it's done. With yours it stays around as
>> a large buffer essentially forever.
>>
>> Kieran
>>
> Your solution frees the buffer in the parser when it's done, but the
> buffers for the decoders are kept (and only grow) during decoding. So
> this commit merely does for the parser what is already done for the
> deocder.
> Anyway, it would be easy to add a check to the parser to free the
> buffer if it is considered excessively large. I don't know how easy it
> would be to add such precautions to the decoder, though. I am also not
> sure what should be considered excessively large? 5 MB? 10 MB? Setting
> it so high that even the highest profiles can't hit it is impossible,
> because for certain profiles ((Progressive) High 10, High 4:2:2, ...)
> no limit is imposed at all (apart from that, such a limit would surely
> be too high).
> 
> - Andreas
> 
Ping. What is other's opinion on this matter? Notice that the current
behaviour is suboptimal even if it is decided that the buffer should not
be kept: It allocates 1/16 more than needed (in addition to
AV_INPUT_BUFFER_PADDING_SIZE) that is guaranteed not to be used; and it
uses mallocz for the allocation.

- Andreas
Anton Khirnov June 22, 2020, 9:15 a.m. UTC | #6
Quoting Andreas Rheinhardt (2020-05-29 17:52:40)
> Ping. What is other's opinion on this matter? Notice that the current
> behaviour is suboptimal even if it is decided that the buffer should not
> be kept: It allocates 1/16 more than needed (in addition to
> AV_INPUT_BUFFER_PADDING_SIZE) that is guaranteed not to be used; and it
> uses mallocz for the allocation.

Random idea: free the buffer on every IDR? Not super optimal, but simple
to implement.
diff mbox

Patch

diff --git a/libavcodec/h264_parser.c b/libavcodec/h264_parser.c
index 5f9a9c46ef..c200a2ab8e 100644
--- a/libavcodec/h264_parser.c
+++ b/libavcodec/h264_parser.c
@@ -53,6 +53,7 @@  typedef struct H264ParseContext {
     H264DSPContext h264dsp;
     H264POCContext poc;
     H264SEIContext sei;
+    H2645RBSP rbsp;
     int is_avc;
     int nal_length_size;
     int got_first;
@@ -246,7 +247,7 @@  static inline int parse_nal_units(AVCodecParserContext *s,
                                   const uint8_t * const buf, int buf_size)
 {
     H264ParseContext *p = s->priv_data;
-    H2645RBSP rbsp = { NULL };
+    H2645RBSP *rbsp = &p->rbsp;
     H2645NAL nal = { NULL };
     int buf_index, next_avc;
     unsigned int pps_id;
@@ -267,9 +268,10 @@  static inline int parse_nal_units(AVCodecParserContext *s,
     if (!buf_size)
         return 0;
 
-    av_fast_padded_malloc(&rbsp.rbsp_buffer, &rbsp.rbsp_buffer_alloc_size, buf_size);
-    if (!rbsp.rbsp_buffer)
+    av_fast_padded_malloc(&rbsp->rbsp_buffer, &rbsp->rbsp_buffer_alloc_size, buf_size);
+    if (!rbsp->rbsp_buffer)
         return AVERROR(ENOMEM);
+    rbsp->rbsp_buffer_size = 0;
 
     buf_index     = 0;
     next_avc      = p->is_avc ? 0 : buf_size;
@@ -308,7 +310,7 @@  static inline int parse_nal_units(AVCodecParserContext *s,
             }
             break;
         }
-        consumed = ff_h2645_extract_rbsp(buf + buf_index, src_length, &rbsp, &nal, 1);
+        consumed = ff_h2645_extract_rbsp(buf + buf_index, src_length, rbsp, &nal, 1);
         if (consumed < 0)
             break;
 
@@ -554,18 +556,15 @@  static inline int parse_nal_units(AVCodecParserContext *s,
                 p->last_frame_num = p->poc.frame_num;
             }
 
-            av_freep(&rbsp.rbsp_buffer);
             return 0; /* no need to evaluate the rest */
         }
     }
     if (q264) {
-        av_freep(&rbsp.rbsp_buffer);
         return 0;
     }
     /* didn't find a picture! */
     av_log(avctx, AV_LOG_ERROR, "missing picture in access unit with size %d\n", buf_size);
 fail:
-    av_freep(&rbsp.rbsp_buffer);
     return -1;
 }
 
@@ -690,6 +689,8 @@  static void h264_close(AVCodecParserContext *s)
     ParseContext *pc = &p->pc;
 
     av_freep(&pc->buffer);
+    av_freep(&p->rbsp.rbsp_buffer);
+    p->rbsp.rbsp_buffer_alloc_size = 0;
 
     ff_h264_sei_uninit(&p->sei);
     ff_h264_ps_uninit(&p->ps);