Message ID | 20171019184647.80641-1-nbowe@google.com |
---|---|
State | New |
Headers | show |
2017-10-19 20:46 GMT+02:00 Nikolas Bowe <nbowe-at-google.com@ffmpeg.org>: > Found via fuzzing. > /tmp/poc is a 1 MB mpegts file generated via fuzzing, where 1 packet has many NALUs > Before this change: > $ /usr/bin/time -f "\t%M Max Resident Set Size (Kb)" ./ffprobe /tmp/poc 2>&1 | tail -n 1 > 2158192 Max Resident Set Size (Kb) > After this change: > $ /usr/bin/time -f "\t%M Max Resident Set Size (Kb)" ./ffprobe /tmp/poc 2>&1 | tail -n 1 > 1046812 Max Resident Set Size (Kb) This does not look like a fix for a "quadratic" memory consumption or do I misunderstand? Does the patch have a measurable speed impact? Carl Eugen
On Thu, Oct 19, 2017 at 3:39 PM, Carl Eugen Hoyos <ceffmpeg@gmail.com> wrote: > 2017-10-19 20:46 GMT+02:00 Nikolas Bowe <nbowe-at-google.com@ffmpeg.org>: > > Found via fuzzing. > > /tmp/poc is a 1 MB mpegts file generated via fuzzing, where 1 packet has > many NALUs > > Before this change: > > $ /usr/bin/time -f "\t%M Max Resident Set Size (Kb)" ./ffprobe > /tmp/poc 2>&1 | tail -n 1 > > 2158192 Max Resident Set Size (Kb) > > After this change: > > $ /usr/bin/time -f "\t%M Max Resident Set Size (Kb)" ./ffprobe > /tmp/poc 2>&1 | tail -n 1 > > 1046812 Max Resident Set Size (Kb) > > This does not look like a fix for a "quadratic" memory consumption or > do I misunderstand? > Before this patch, for each NALU in the packet, rbsp_buffer would be sized from the start of the NALU to the end of the packet, not the end of the NALU. This would occur for each NALU in the packet. Total memory allocated in all the rbsp_buffers for all the NALUs in the packet would be N + (N+x1) + (N+x2) + ... This is quadratic in the number of NALUs in the packet. The fuzzer's proof of concept file is a bit extreme. It has over 2000 small NALUs in one packet. An easier way to trigger this would be to put some small NALUs (perhaps SEI) in front of large IDRs. Each small NALU added to the front doubles the total memory allocated for rbsp_buffers for that packet. > Does the patch have a measurable speed impact? > > Is there a standard set of benchmarks I can run? For typical videos the speed impact is small, due to NALU fitting in cache, but for videos with many large NALUs there can be some slowdown. Here is the decode time for some typical video and some extreme cases, taking the result with the best user+system time of 3 runs. Measured via for i in {1..3}; do /usr/bin/time -f "\t%U User CPU seconds\n\t%S System CPU seconds\n\t%M Max Resident Set Size (Kb)" ./ffmpeg -loglevel quiet -nostats -threads 1 -i $file -f null - ; done Tears of Steel HD Somewhat typical HD short movie. 728 MB, 8314kb/s no patch: 113.69 User CPU seconds 0.60 System CPU seconds 44784 Max Resident Set Size (Kb) patch: 112.52 User CPU seconds 0.40 System CPU seconds 44780 Max Resident Set Size (Kb) 1% slower. Tears of Steel 4k Somewhat typical high-ish bitrate 4k movie. 73244 kb/s. no patch: 682.70 User CPU seconds 2.99 System CPU seconds 104420 Max Resident Set Size (Kb) patch: 716.06 User CPU seconds 4.08 System CPU seconds 103632 Max Resident Set Size (Kb) 5% slower. random 50 Mbps i-only video I had laying around no patch: 421.33 User CPU seconds 1.21 System CPU seconds 28284 Max Resident Set Size (Kb) patch 423.00 User CPU seconds 1.98 System CPU seconds 27300 Max Resident Set Size (Kb) <1% slower foo_200M.ts 10 seconds of /dev/urandom at 4k, encoded at 200Mbps using 2 pass, 2 second GOP (it actually used P frames even though its encoding random noise). This is no patch: 11.52 User CPU seconds 0.19 System CPU seconds 68668 Max Resident Set Size (Kb) patch: 11.92 User CPU seconds 0.15 System CPU seconds 68656 Max Resident Set Size (Kb) 3% slower large_nals.ts This is an extreme case. I tried to come up with a very bad case for slowdown. Every packets has very large VCL NALUs. Generated via ./ffmpeg -f rawvideo -video_size 3840x2160 -pixel_format yuv420p -framerate 30 -i /dev/urandom -t 5 -c:v libx264 -preset ultrafast -crf 0 -g 1 -y large_nals.ts Each packet in large_nals.ts is close to 20 MB. There's 5 NALUs per packet: 9 (AUD) 7 (SPS) 8 (PPS) 6(SEI) 5 (IDR slice). This is of course unrealistically large at over 4 Gbps, but it should be a decent worst case example. no patch: 46.76 User CPU seconds 3.10 System CPU seconds 199148 Max Resident Set Size (Kb) patch: 54.42 User CPU seconds 1.79 System CPU seconds 156720 Max Resident Set Size (Kb) 13% slower when there are very large NALUs (also 21% less memory usage) > Carl Eugen > _______________________________________________ > ffmpeg-devel mailing list > ffmpeg-devel@ffmpeg.org > http://ffmpeg.org/mailman/listinfo/ffmpeg-devel >
2017-10-24 1:43 GMT+02:00 Niki Bowe <nbowe-at-google.com@ffmpeg.org>: > On Thu, Oct 19, 2017 at 3:39 PM, Carl Eugen Hoyos <ceffmpeg@gmail.com> > wrote: >> Does the patch have a measurable speed impact? >> > Is there a standard set of benchmarks I can run? > > For typical videos the speed impact is small, due to NALU fitting in cache, > but for videos with many large NALUs there can be some slowdown. (5% overall slowdown would make every patch unacceptable but I doubt this is the case.) Use the TIMER makros from libavutil/timer.h, put them around all calls to ff_h2645_extract_rbsp(). Carl Eugen
On Tue, 24 Oct 2017 at 00:56 Carl Eugen Hoyos <ceffmpeg@gmail.com> wrote: > 2017-10-24 1:43 GMT+02:00 Niki Bowe <nbowe-at-google.com@ffmpeg.org>: > > On Thu, Oct 19, 2017 at 3:39 PM, Carl Eugen Hoyos <ceffmpeg@gmail.com> > > wrote: > > >> Does the patch have a measurable speed impact? > >> > > Is there a standard set of benchmarks I can run? > > > > For typical videos the speed impact is small, due to NALU fitting in > cache, > > but for videos with many large NALUs there can be some slowdown. > > (5% overall slowdown would make every patch unacceptable > but I doubt this is the case.) > > Use the TIMER makros from libavutil/timer.h, put them around all > calls to ff_h2645_extract_rbsp(). > Related to https://trac.ffmpeg.org/ticket/6789, we see huge memory allocations in this code so if this patch fixes, it should be committed irrespective of any speed loss. Kieran
On Tue, 31 Oct 2017 at 01:42 Kieran Kunhya <kierank@obe.tv> wrote: > On Tue, 24 Oct 2017 at 00:56 Carl Eugen Hoyos <ceffmpeg@gmail.com> wrote: > >> 2017-10-24 1:43 GMT+02:00 Niki Bowe <nbowe-at-google.com@ffmpeg.org>: >> > On Thu, Oct 19, 2017 at 3:39 PM, Carl Eugen Hoyos <ceffmpeg@gmail.com> >> > wrote: >> >> >> Does the patch have a measurable speed impact? >> >> >> > Is there a standard set of benchmarks I can run? >> > >> > For typical videos the speed impact is small, due to NALU fitting in >> cache, >> > but for videos with many large NALUs there can be some slowdown. >> >> (5% overall slowdown would make every patch unacceptable >> but I doubt this is the case.) >> >> Use the TIMER makros from libavutil/timer.h, put them around all >> calls to ff_h2645_extract_rbsp(). >> > > Related to https://trac.ffmpeg.org/ticket/6789, we see huge memory > allocations in this code so if this patch fixes, it should be committed > irrespective of any speed loss. > > Kieran > I confirm this patch fixes ticket 6789. Kieran
Hi On Mon, Oct 23, 2017 at 04:43:55PM -0700, Niki Bowe wrote: > On Thu, Oct 19, 2017 at 3:39 PM, Carl Eugen Hoyos <ceffmpeg@gmail.com> > wrote: > > > 2017-10-19 20:46 GMT+02:00 Nikolas Bowe <nbowe-at-google.com@ffmpeg.org>: > > > Found via fuzzing. > > > /tmp/poc is a 1 MB mpegts file generated via fuzzing, where 1 packet has > > many NALUs > > > Before this change: > > > $ /usr/bin/time -f "\t%M Max Resident Set Size (Kb)" ./ffprobe > > /tmp/poc 2>&1 | tail -n 1 > > > 2158192 Max Resident Set Size (Kb) > > > After this change: > > > $ /usr/bin/time -f "\t%M Max Resident Set Size (Kb)" ./ffprobe > > /tmp/poc 2>&1 | tail -n 1 > > > 1046812 Max Resident Set Size (Kb) > > > > This does not look like a fix for a "quadratic" memory consumption or > > do I misunderstand? > > > > Before this patch, for each NALU in the packet, rbsp_buffer would be sized > from the start of the NALU to the end of the packet, not the end of the > NALU. > This would occur for each NALU in the packet. Total memory allocated in all > the rbsp_buffers for all the NALUs in the packet would be N + (N+x1) + > (N+x2) + ... > This is quadratic in the number of NALUs in the packet. A better solution would be to allocate a buffer sized based on the whole packet and then have all teh NALs point into that. That way theres no need to know the size of the first NAL during allocation. This should have even lower memory overhead than your code fewer alloc/free calls, and no speed loss [...]
On Thu, Oct 19, 2017 at 11:46:47AM -0700, Nikolas Bowe wrote: > Found via fuzzing. > /tmp/poc is a 1 MB mpegts file generated via fuzzing, where 1 packet has many NALUs > Before this change: > $ /usr/bin/time -f "\t%M Max Resident Set Size (Kb)" ./ffprobe /tmp/poc 2>&1 | tail -n 1 > 2158192 Max Resident Set Size (Kb) > After this change: > $ /usr/bin/time -f "\t%M Max Resident Set Size (Kb)" ./ffprobe /tmp/poc 2>&1 | tail -n 1 > 1046812 Max Resident Set Size (Kb) > --- > libavcodec/h2645_parse.c | 13 +++++++++++-- > 1 file changed, 11 insertions(+), 2 deletions(-) > > diff --git a/libavcodec/h2645_parse.c b/libavcodec/h2645_parse.c > index b0d9ff66f0..e77689f347 100644 > --- a/libavcodec/h2645_parse.c > +++ b/libavcodec/h2645_parse.c > @@ -32,7 +32,7 @@ > int ff_h2645_extract_rbsp(const uint8_t *src, int length, > H2645NAL *nal, int small_padding) > { > - int i, si, di; > + int i, si, di, nsc; > uint8_t *dst; > int64_t padding = small_padding ? 0 : MAX_MBPAIR_SIZE; > > @@ -91,8 +91,17 @@ int ff_h2645_extract_rbsp(const uint8_t *src, int length, > } else if (i > length) > i = length; > > + // Find next NAL start code, if present, to reduce rbsp_buffer size when > + // multiple NALUs. > + for (nsc = i; nsc + 2 < length; nsc++) { > + if (src[nsc] == 0 && src[nsc + 1] == 0 && src[nsc + 2] == 1) > + break; > + } > + if (nsc + 2 == length) > + nsc = length; > + > av_fast_padded_malloc(&nal->rbsp_buffer, &nal->rbsp_buffer_size, > - length + padding); > + nsc + padding); > if (!nal->rbsp_buffer) > return AVERROR(ENOMEM); This reduces memory consumption to linear from qudratic but i think it still can be made to allocate very large amounts of memory. That is with many small NAL units MAX_MBPAIR_SIZE would be allocated for each.in worst case. So this does fix the qudratic issue but not the OOM issue. Using the same buffer for all would fix it unless iam missing something. Using the same buffer avoids the padding needs for all but the last. So its alot less memory for many small nal units [...]
On Thu, Oct 19, 2017 at 11:46:47AM -0700, Nikolas Bowe wrote: > Found via fuzzing. > /tmp/poc is a 1 MB mpegts file generated via fuzzing, where 1 packet has many NALUs > Before this change: > $ /usr/bin/time -f "\t%M Max Resident Set Size (Kb)" ./ffprobe /tmp/poc 2>&1 | tail -n 1 > 2158192 Max Resident Set Size (Kb) > After this change: > $ /usr/bin/time -f "\t%M Max Resident Set Size (Kb)" ./ffprobe /tmp/poc 2>&1 | tail -n 1 > 1046812 Max Resident Set Size (Kb) > --- > libavcodec/h2645_parse.c | 13 +++++++++++-- > 1 file changed, 11 insertions(+), 2 deletions(-) This patch also fixes 2145/clusterfuzz-testcase-minimized-5866217724182528 that should be added to the commit message (though as said, this fix is not ideal or complete, I would very much prefer if this would be fixed by using a single buffer or any other solution that avoids the speedloss.) Also please tell me in case you choose not to work on this further. thx [...]
On Tue, 31 Oct 2017 at 02:26 Michael Niedermayer <michael@niedermayer.cc> wrote: > On Thu, Oct 19, 2017 at 11:46:47AM -0700, Nikolas Bowe wrote: > > Found via fuzzing. > > /tmp/poc is a 1 MB mpegts file generated via fuzzing, where 1 packet has > many NALUs > > Before this change: > > $ /usr/bin/time -f "\t%M Max Resident Set Size (Kb)" ./ffprobe > /tmp/poc 2>&1 | tail -n 1 > > 2158192 Max Resident Set Size (Kb) > > After this change: > > $ /usr/bin/time -f "\t%M Max Resident Set Size (Kb)" ./ffprobe > /tmp/poc 2>&1 | tail -n 1 > > 1046812 Max Resident Set Size (Kb) > > --- > > libavcodec/h2645_parse.c | 13 +++++++++++-- > > 1 file changed, 11 insertions(+), 2 deletions(-) > > This patch also fixes 2145/clusterfuzz-testcase-minimized-5866217724182528 > that should be added to the commit message > > (though as said, this fix is not ideal or complete, I would very much > prefer if this would be fixed by using a single buffer or any other > solution that avoids the speedloss.) > > Also please tell me in case you choose not to work on this further. > > thx > > [...] > Hi, I left the sample in https://trac.ffmpeg.org/ticket/6789 running overnight, it still leaks with this patch, just much slower. So there is still a related (but separate?) bug here. Kieran
> > This reduces memory consumption to linear from qudratic but i think > it still can be made to allocate very large amounts of memory. > That is with many small NAL units MAX_MBPAIR_SIZE would be allocated > for each.in worst case. > So this does fix the qudratic issue but not the OOM issue. > Using the same buffer for all would fix it unless iam missing something. > Using the same buffer avoids the padding needs for all but the last. > So its alot less memory for many small nal units > This wouldn't fix the issue because the nals_allocated cache could still grow with a crafted stream. Kieran
On 10/31/2017 2:25 AM, Michael Niedermayer wrote: > (though as said, this fix is not ideal or complete, I would very much > prefer if this would be fixed by using a single buffer or any other > solution that avoids the speedloss.) Using a single buffer would be marginally faster, but it does not solve the underlying problem, which is that the NAL "cache" (nals_allocated) seems to be cumulative, and the size of each buffer in it seems to be the largest observed size of a NAL in that position. Consider I could craft a stream that contains, in order: Has 1999 tiny NALs, followed by 1 10MiB NAL, in packet 1. Has 1998 tiny NALs, followed by 1 10MiB NAL, in packet 2. . . . Has 500 tiny NALs, followed by 1 10MiB NAL, in packet 1500. . . . And so forth. The result would be that we have 2000 10MiB buffers allocated in the NAL memory "pool" (nals_allocated == 2000), which will persist until the decode is deinitialized. Am I missing something here? P.S. I see Kieran mailed the same thing as I wrote this. - Derek
> > Am I missing something here? > > P.S. I see Kieran mailed the same thing as I wrote this. > Further to Derek's excellent explanation, I think the best way is to go back to the old "in-place" NAL parsing code for H264. I will prepare a patch to do this. Kieran
Thank you Kieran, Michael, Derek, and Carl. I ran some benchmarks and the patch I provided really does have a speed impact so I was considering alternative ways to achieve the same result. After a little thought last week I came to the same conclusion as Michael that a single rbsp_buffer per packet, with the nals pointing into that would be ideal because * allows a single allocation for all the NALs, sized to the size of the packet (+ padding. see next) * I think the padding is only to allow unchecked bitstream reading, so we only need 1 unit of padding (MAX_MBPAIR_SIZE) if we have a single buffer, rather than 1 per NAL. Also IIRC it used to be this way a long time ago. Likewise skipped_bytes_pos could also be made a single buffer for the packet rather than 1 per NALU. Probably not as big a benefit to be made there though, besides avoiding some allocations. I was going to prepare a patch, but it sounds like Kieran is going to. Thank you Kieran. Let me know if there's anything you want me to do, or if you would prefer me to do it for the experience. As for the NAL memory pool, I think I see what you mean. You are saying h264dec.c is reusing the H2645Packet packet in decode_nal_units, so any growth in nals_allocated from previous packets carries over to later packets, and also the nals themselves also carry over even if the current packet has small pkt->nb_nals. Moving rbsp_buffer (and maybe skipped_bytes_pos) out of the nals and into H2645Packet should fix the worst of this. On Tue, Oct 31, 2017 at 1:28 PM, Kieran Kunhya <kierank@obe.tv> wrote: > > > > Am I missing something here? > > > > P.S. I see Kieran mailed the same thing as I wrote this. > > > > Further to Derek's excellent explanation, I think the best way is to go > back to the old "in-place" NAL parsing code for H264. > I will prepare a patch to do this. > > Kieran > _______________________________________________ > ffmpeg-devel mailing list > ffmpeg-devel@ffmpeg.org > http://ffmpeg.org/mailman/listinfo/ffmpeg-devel >
On Tue, 31 Oct 2017 at 22:36 Niki Bowe <nbowe-at-google.com@ffmpeg.org> wrote: > I was going to prepare a patch, but it sounds like Kieran is going to. > Thank you Kieran. Let me know if there's anything you want me to do, or if > you would prefer me to do it for the experience. > > As for the NAL memory pool, I think I see what you mean. You are saying > h264dec.c is reusing the H2645Packet packet in decode_nal_units, so any > growth in nals_allocated from previous packets carries over to later > packets, and also the nals themselves also carry over even if the current > packet has small pkt->nb_nals. > Moving rbsp_buffer (and maybe skipped_bytes_pos) out of the nals and into > H2645Packet should fix the worst of this. > I am not sure this is possible. I will see tomorrow. Just FYI we are regularly seeing OOMs on some 24/7 live streams as a result of h2645_parser. Kieran
On Tue, Oct 31, 2017 at 08:24:37PM +0000, Derek Buitenhuis wrote: > On 10/31/2017 2:25 AM, Michael Niedermayer wrote: > > (though as said, this fix is not ideal or complete, I would very much > > prefer if this would be fixed by using a single buffer or any other > > solution that avoids the speedloss.) > > Using a single buffer would be marginally faster, but it does not solve > the underlying problem, which is that the NAL "cache" (nals_allocated) > seems to be cumulative, and the size of each buffer in it seems to be > the largest observed size of a NAL in that position. > > Consider I could craft a stream that contains, in order: > > Has 1999 tiny NALs, followed by 1 10MiB NAL, in packet 1. > Has 1998 tiny NALs, followed by 1 10MiB NAL, in packet 2. > . > . > . > Has 500 tiny NALs, followed by 1 10MiB NAL, in packet 1500. > . > . > . > And so forth. > > The result would be that we have 2000 10MiB buffers allocated > in the NAL memory "pool" (nals_allocated == 2000), which will > persist until the decode is deinitialized. > > Am I missing something here? The idea would be that there would only be one uint8_t buffer and the 2000 entries from te pool would point into that. So as a larger NAL shifts through the 2000 the pointers would get distributed differently but the size would not grow Any variable size buffer the H2645NAL needs would be such a "shared" buffer I dont know if theres anything that would make this non trivial to implement. > > P.S. I see Kieran mailed the same thing as I wrote this. > > - Derek > _______________________________________________ > ffmpeg-devel mailing list > ffmpeg-devel@ffmpeg.org > http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
On 11/1/2017 1:36 AM, Michael Niedermayer wrote: > The idea would be that there would only be one uint8_t buffer and the > 2000 entries from te pool would point into that. > So as a larger NAL shifts through the 2000 the pointers would get > distributed differently but the size would not grow > Any variable size buffer the H2645NAL needs would be such a "shared" > buffer This fixes the type of case I mentioned, but it doesn't fix something, for example, on a never-ending stream (e.g. a TV channel), if you hit a bunch of huge NALs in a packet, the single buffer will be resizes to fit them all, and so instead of a momentary memory usage spike, it lasts forever (or until you restart the stream / reinit the decoder). - Derek
> > The idea would be that there would only be one uint8_t buffer and the > 2000 entries from te pool would point into that. > So as a larger NAL shifts through the 2000 the pointers would get > distributed differently but the size would not grow > Any variable size buffer the H2645NAL needs would be such a "shared" > buffer > > I dont know if theres anything that would make this non trivial to > implement. > I have tried this using the following patch but it does not work: https://www.irccloud.com/pastebin/qobTcW9d/ Nothing obviously seems wrong so I suspect it's not possible to do this whilst reusing the code between decoder and parser. The old code use to write to a per-slice context, not a per packet one so this might be related. Kieran
On 11/2/2017 10:48 PM, Kieran Kunhya wrote: > I have tried this using the following patch but it does not work: > https://www.irccloud.com/pastebin/qobTcW9d/ > > Nothing obviously seems wrong so I suspect it's not possible to do this > whilst reusing the code between decoder and parser. > The old code use to write to a per-slice context, not a per packet one so > this might be related. As discussed on IRC, the issue was dangling pointers due to the realloc. It can be allocated based on the full packet size, outside the function, or on first call, I think. Aside: Do we have some preferred way to address the cumulative memory issue I mentioned in my previous mail, or is it a DNC situation? I haven't thought of a good way to solve that - only stuff like going back to in-place NAL parsing, or shrinking the buffer after some time based on some metric (ew...). - Derek
On Fri, Nov 3, 2017 at 12:38 AM, Derek Buitenhuis <derek.buitenhuis@gmail.com> wrote: > On 11/2/2017 10:48 PM, Kieran Kunhya wrote: >> I have tried this using the following patch but it does not work: >> https://www.irccloud.com/pastebin/qobTcW9d/ >> >> Nothing obviously seems wrong so I suspect it's not possible to do this >> whilst reusing the code between decoder and parser. >> The old code use to write to a per-slice context, not a per packet one so >> this might be related. > > As discussed on IRC, the issue was dangling pointers due to the realloc. > > It can be allocated based on the full packet size, outside the function, > or on first call, I think. > > Aside: > > Do we have some preferred way to address the cumulative memory issue > I mentioned in my previous mail, or is it a DNC situation? I haven't > thought of a good way to solve that - only stuff like going back > to in-place NAL parsing, or shrinking the buffer after some time based > on some metric (ew...). > Personally I don't think the problem of a single buffer getting way too large is a huge problem, at least it should not accumulate and keep growing. Its also not a "new" problem, the old code would've suffered from the same issue as well, since it also cached the rbsp buffer - just only one, not multiple. Another option would be to re-arrange the API to keep it shared but go back to an iterator-style parsing of the NALs. Parse one, process it, parse the next. That way you can also avoid the issue with a multitude of NAL structs being allocated and sitting there forever. Obviously H264 would be compatible with that, as its how it was used before. I didn't check HEVC, however. - Hendrik
diff --git a/libavcodec/h2645_parse.c b/libavcodec/h2645_parse.c index b0d9ff66f0..e77689f347 100644 --- a/libavcodec/h2645_parse.c +++ b/libavcodec/h2645_parse.c @@ -32,7 +32,7 @@ int ff_h2645_extract_rbsp(const uint8_t *src, int length, H2645NAL *nal, int small_padding) { - int i, si, di; + int i, si, di, nsc; uint8_t *dst; int64_t padding = small_padding ? 0 : MAX_MBPAIR_SIZE; @@ -91,8 +91,17 @@ int ff_h2645_extract_rbsp(const uint8_t *src, int length, } else if (i > length) i = length; + // Find next NAL start code, if present, to reduce rbsp_buffer size when + // multiple NALUs. + for (nsc = i; nsc + 2 < length; nsc++) { + if (src[nsc] == 0 && src[nsc + 1] == 0 && src[nsc + 2] == 1) + break; + } + if (nsc + 2 == length) + nsc = length; + av_fast_padded_malloc(&nal->rbsp_buffer, &nal->rbsp_buffer_size, - length + padding); + nsc + padding); if (!nal->rbsp_buffer) return AVERROR(ENOMEM);