[FFmpeg-devel,3/5] startcode: Stop overreading

Message ID	20190601224719.32872-4-andreas.rheinhardt@gmail.com
State	Superseded
Headers	show Return-Path: <ffmpeg-devel-bounces@ffmpeg.org> From: Andreas Rheinhardt <andreas.rheinhardt@gmail.com> To: ffmpeg-devel@ffmpeg.org Date: Sun, 2 Jun 2019 00:47:17 +0200 Message-Id: <20190601224719.32872-4-andreas.rheinhardt@gmail.com> In-Reply-To: <20190601224719.32872-1-andreas.rheinhardt@gmail.com> References: <20190601224719.32872-1-andreas.rheinhardt@gmail.com> MIME-Version: 1.0 Subject: [FFmpeg-devel] [PATCH 3/5] startcode: Stop overreading Precedence: list Reply-To: FFmpeg development discussions and patches <ffmpeg-devel@ffmpeg.org> Cc: Andreas Rheinhardt <andreas.rheinhardt@gmail.com> Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: base64 Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" <ffmpeg-devel-bounces@ffmpeg.org>

Andreas Rheinhardt June 1, 2019, 10:47 p.m. UTC

Up until now ff_startcode_find_candidate_c could overread; it relied on
zero-padding after the buffer in order to function correctly. This has
been changed: No overreads occur any more.
The ultimate goal behind all this is to create a high-performance
function for searching of startcodes that can be applied even in
scenarios where the buffer is not padded.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@gmail.com>
---
 libavcodec/startcode.c | 5 +----
 1 file changed, 1 insertion(+), 4 deletions(-)

Michael Niedermayer June 3, 2019, 9:45 p.m. UTC | #1

On Sun, Jun 02, 2019 at 12:47:17AM +0200, Andreas Rheinhardt wrote:
> Up until now ff_startcode_find_candidate_c could overread; it relied on
> zero-padding after the buffer in order to function correctly. This has
> been changed: No overreads occur any more.
> The ultimate goal behind all this is to create a high-performance
> function for searching of startcodes that can be applied even in
> scenarios where the buffer is not padded.
> 
> Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@gmail.com>
> ---
>  libavcodec/startcode.c | 5 +----
>  1 file changed, 1 insertion(+), 4 deletions(-)
> 
> diff --git a/libavcodec/startcode.c b/libavcodec/startcode.c
> index 373572365b..b027c191c0 100644
> --- a/libavcodec/startcode.c
> +++ b/libavcodec/startcode.c
> @@ -41,10 +41,7 @@ int ff_startcode_find_candidate_c(const uint8_t *buf, int size)
>  
>  #define READ(bitness) AV_RN ## bitness ## A
>  #define MAIN_LOOP(bitness, mask1, mask2) do {                              \
> -        /* we check p < end instead of p + 3 / 7 because it is
> -         * simpler and there must be AV_INPUT_BUFFER_PADDING_SIZE
> -         * bytes at the end. */                                            \

> -        for (; buf < end; buf += bitness / 8)                              \
> +        for (; buf <= end - bitness / 8; buf += bitness / 8)               \

is this faster than subtracting  bitness / 8 from end outside the loop ?
if not then i would suggest to do it outside as that means 1 thing less
the compiler has to optimize out

thx

[...]

Andreas Rheinhardt June 3, 2019, 10:45 p.m. UTC | #2

Michael Niedermayer:
> On Sun, Jun 02, 2019 at 12:47:17AM +0200, Andreas Rheinhardt wrote:
>> Up until now ff_startcode_find_candidate_c could overread; it relied on
>> zero-padding after the buffer in order to function correctly. This has
>> been changed: No overreads occur any more.
>> The ultimate goal behind all this is to create a high-performance
>> function for searching of startcodes that can be applied even in
>> scenarios where the buffer is not padded.
>>
>> Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@gmail.com>
>> ---
>>  libavcodec/startcode.c | 5 +----
>>  1 file changed, 1 insertion(+), 4 deletions(-)
>>
>> diff --git a/libavcodec/startcode.c b/libavcodec/startcode.c
>> index 373572365b..b027c191c0 100644
>> --- a/libavcodec/startcode.c
>> +++ b/libavcodec/startcode.c
>> @@ -41,10 +41,7 @@ int ff_startcode_find_candidate_c(const uint8_t *buf, int size)
>>  
>>  #define READ(bitness) AV_RN ## bitness ## A
>>  #define MAIN_LOOP(bitness, mask1, mask2) do {                              \
>> -        /* we check p < end instead of p + 3 / 7 because it is
>> -         * simpler and there must be AV_INPUT_BUFFER_PADDING_SIZE
>> -         * bytes at the end. */                                            \
> 
>> -        for (; buf < end; buf += bitness / 8)                              \
>> +        for (; buf <= end - bitness / 8; buf += bitness / 8)               \
> 
> is this faster than subtracting  bitness / 8 from end outside the loop ?
> if not then i would suggest to do it outside as that means 1 thing less
> the compiler has to optimize out
> 
> thx
In my tests the compiler calculates end - bitness / 8 only once and
puts it in a register of its own. But I can do it manually, of course.
In fact, I just noticed a potential undefined behaviour in this patch
(and only in this patch, the next one actually fixes it), so this
needs to be changed anyway: Nothing guarantees that end - bitness / 8
is actually part of the array any more and so this would be undefined
behaviour (even if no access occurs).

- Andreas

Michael Niedermayer June 4, 2019, 11:16 a.m. UTC | #3

On Mon, Jun 03, 2019 at 10:45:00PM +0000, Andreas Rheinhardt wrote:
> Michael Niedermayer:
> > On Sun, Jun 02, 2019 at 12:47:17AM +0200, Andreas Rheinhardt wrote:
> >> Up until now ff_startcode_find_candidate_c could overread; it relied on
> >> zero-padding after the buffer in order to function correctly. This has
> >> been changed: No overreads occur any more.
> >> The ultimate goal behind all this is to create a high-performance
> >> function for searching of startcodes that can be applied even in
> >> scenarios where the buffer is not padded.
> >>
> >> Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@gmail.com>
> >> ---
> >>  libavcodec/startcode.c | 5 +----
> >>  1 file changed, 1 insertion(+), 4 deletions(-)
> >>
> >> diff --git a/libavcodec/startcode.c b/libavcodec/startcode.c
> >> index 373572365b..b027c191c0 100644
> >> --- a/libavcodec/startcode.c
> >> +++ b/libavcodec/startcode.c
> >> @@ -41,10 +41,7 @@ int ff_startcode_find_candidate_c(const uint8_t *buf, int size)
> >>  
> >>  #define READ(bitness) AV_RN ## bitness ## A
> >>  #define MAIN_LOOP(bitness, mask1, mask2) do {                              \
> >> -        /* we check p < end instead of p + 3 / 7 because it is
> >> -         * simpler and there must be AV_INPUT_BUFFER_PADDING_SIZE
> >> -         * bytes at the end. */                                            \
> > 
> >> -        for (; buf < end; buf += bitness / 8)                              \
> >> +        for (; buf <= end - bitness / 8; buf += bitness / 8)               \
> > 
> > is this faster than subtracting  bitness / 8 from end outside the loop ?
> > if not then i would suggest to do it outside as that means 1 thing less
> > the compiler has to optimize out
> > 
> > thx
> In my tests the compiler calculates end - bitness / 8 only once and
> puts it in a register of its own. But I can do it manually, of course.

A compiler also can change between pointers and indexes, yet you change
that by hand in a previous patch. (so it seems the compiler must have
failed there with a trivial and expected optimization)
I think if one optimizes code at the level of detail that you do here
its better not to depend on the compiler more than needed for optimizing
things on top

Thanks

[...]

Andreas Rheinhardt June 9, 2019, 11 a.m. UTC | #4

Hello,

I have added the necessary numbers for the index version (which you can
find at https://github.com/mkver/FFmpeg/commits/start_6 if you care) to
the commit message of the first patch and have also adapted the
overread check as you suggested (no performance impact whatsoever
from this).

- Andreas

Andreas Rheinhardt (5):
  startcode: Use common macro and switch to pointer arithmetic
  startcode: Switch to aligned reads
  startcode: Stop overreading
  startcode: Don't return false positives
  startcode: Filter out non-startcodes earlier

 libavcodec/h264dsp.h   |   7 +--
 libavcodec/startcode.c | 133 ++++++++++++++++++++++++++++++++++-------
 libavcodec/vc1dsp.h    |   6 +-
 3 files changed, 117 insertions(+), 29 deletions(-)

Andreas Rheinhardt June 17, 2019, 4:23 a.m. UTC | #5

Andreas Rheinhardt:
> Hello,
> 
> I have added the necessary numbers for the index version (which you can
> find at https://github.com/mkver/FFmpeg/commits/start_6 if you care) to
> the commit message of the first patch and have also adapted the
> overread check as you suggested (no performance impact whatsoever
> from this).
> 
> - Andreas
> 
> Andreas Rheinhardt (5):
>   startcode: Use common macro and switch to pointer arithmetic
>   startcode: Switch to aligned reads
>   startcode: Stop overreading
>   startcode: Don't return false positives
>   startcode: Filter out non-startcodes earlier
> 
>  libavcodec/h264dsp.h   |   7 +--
>  libavcodec/startcode.c | 133 ++++++++++++++++++++++++++++++++++-------
>  libavcodec/vc1dsp.h    |   6 +-
>  3 files changed, 117 insertions(+), 29 deletions(-)
> 
Ping.

- Andreas

[FFmpeg-devel,3/5] startcode: Stop overreading

Commit Message

Comments

Patch