diff mbox series

[FFmpeg-devel,1/2] swscale: fix NEON hscale init

Message ID 20200507112535.24582-1-josh@itanimul.li
State Accepted
Commit 718c8f9aa59751bb490e2688acf2b5cb68fd5ad1
Headers show
Series [FFmpeg-devel,1/2] swscale: fix NEON hscale init | expand

Checks

Context Check Description
andriy/default pending
andriy/make success Make finished
andriy/make_fate success Make fate finished

Commit Message

Josh Dekker May 7, 2020, 11:25 a.m. UTC
The NEON hscale function only supports X8 filter sizes and should only
be selected when these are being used.

Signed-off-by: Josh de Kock <josh@itanimul.li>
---
 libswscale/aarch64/swscale.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

Comments

Michael Niedermayer May 8, 2020, 11:25 a.m. UTC | #1
On Thu, May 07, 2020 at 12:25:34PM +0100, Josh de Kock wrote:
> The NEON hscale function only supports X8 filter sizes and should only
> be selected when these are being used.
> 
> Signed-off-by: Josh de Kock <josh@itanimul.li>
> ---
>  libswscale/aarch64/swscale.c | 5 ++++-
>  1 file changed, 4 insertions(+), 1 deletion(-)
> 
> diff --git a/libswscale/aarch64/swscale.c b/libswscale/aarch64/swscale.c
> index 54a3beabe8..eecbea88ca 100644
> --- a/libswscale/aarch64/swscale.c
> +++ b/libswscale/aarch64/swscale.c
> @@ -34,7 +34,10 @@ av_cold void ff_sws_init_swscale_aarch64(SwsContext *c)
>      int cpu_flags = av_get_cpu_flags();
>  
>      if (have_neon(cpu_flags)) {
> -        if (c->srcBpc == 8 && c->dstBpc <= 14) {
> +        if (c->srcBpc == 8 && c->dstBpc <= 14 &&
> +            (c->hLumFilterSize % 8) == 0 &&
> +            (c->hChrFilterSize % 8) == 0)
> +        {
>              c->hyScale = c->hcScale = ff_hscale_8_to_15_neon;
>          }

isnt filterAlign set to 8 when neon is available ?

[...]
Josh Dekker May 15, 2020, 10:27 a.m. UTC | #2
On 08/05/2020 12:25, Michael Niedermayer wrote:
> On Thu, May 07, 2020 at 12:25:34PM +0100, Josh de Kock wrote:
>> The NEON hscale function only supports X8 filter sizes and should only
>> be selected when these are being used.
>>
>> Signed-off-by: Josh de Kock <josh@itanimul.li>
>> ---
>>   libswscale/aarch64/swscale.c | 5 ++++-
>>   1 file changed, 4 insertions(+), 1 deletion(-)
>>
>> diff --git a/libswscale/aarch64/swscale.c b/libswscale/aarch64/swscale.c
>> index 54a3beabe8..eecbea88ca 100644
>> --- a/libswscale/aarch64/swscale.c
>> +++ b/libswscale/aarch64/swscale.c
>> @@ -34,7 +34,10 @@ av_cold void ff_sws_init_swscale_aarch64(SwsContext *c)
>>       int cpu_flags = av_get_cpu_flags();
>>   
>>       if (have_neon(cpu_flags)) {
>> -        if (c->srcBpc == 8 && c->dstBpc <= 14) {
>> +        if (c->srcBpc == 8 && c->dstBpc <= 14 &&
>> +            (c->hLumFilterSize % 8) == 0 &&
>> +            (c->hChrFilterSize % 8) == 0)
>> +        {
>>               c->hyScale = c->hcScale = ff_hscale_8_to_15_neon;
>>           }
> 
> isnt filterAlign set to 8 when neon is available ?
> 
> [...]
>

Discussed on IRC. Pushed with set.
Carl Eugen Hoyos May 16, 2020, 10:50 a.m. UTC | #3
Am Fr., 15. Mai 2020 um 12:27 Uhr schrieb Josh de Kock <josh@itanimul.li>:
>
> On 08/05/2020 12:25, Michael Niedermayer wrote:
> > On Thu, May 07, 2020 at 12:25:34PM +0100, Josh de Kock wrote:
> >> The NEON hscale function only supports X8 filter sizes and should only
> >> be selected when these are being used.
> >>
> >> Signed-off-by: Josh de Kock <josh@itanimul.li>
> >> ---
> >>   libswscale/aarch64/swscale.c | 5 ++++-
> >>   1 file changed, 4 insertions(+), 1 deletion(-)
> >>
> >> diff --git a/libswscale/aarch64/swscale.c b/libswscale/aarch64/swscale.c
> >> index 54a3beabe8..eecbea88ca 100644
> >> --- a/libswscale/aarch64/swscale.c
> >> +++ b/libswscale/aarch64/swscale.c
> >> @@ -34,7 +34,10 @@ av_cold void ff_sws_init_swscale_aarch64(SwsContext *c)
> >>       int cpu_flags = av_get_cpu_flags();
> >>
> >>       if (have_neon(cpu_flags)) {
> >> -        if (c->srcBpc == 8 && c->dstBpc <= 14) {
> >> +        if (c->srcBpc == 8 && c->dstBpc <= 14 &&
> >> +            (c->hLumFilterSize % 8) == 0 &&
> >> +            (c->hChrFilterSize % 8) == 0)
> >> +        {
> >>               c->hyScale = c->hcScale = ff_hscale_8_to_15_neon;
> >>           }
> >
> > isnt filterAlign set to 8 when neon is available ?
>
> Discussed on IRC. Pushed with set.

Could you give a very short explanation on why the comment
was not relevant?

Thank you, Carl Eugen
Josh Dekker May 16, 2020, 8:36 p.m. UTC | #4
On 16/05/2020 11:50, Carl Eugen Hoyos wrote:
> Am Fr., 15. Mai 2020 um 12:27 Uhr schrieb Josh de Kock <josh@itanimul.li>:
>>
>> On 08/05/2020 12:25, Michael Niedermayer wrote:
>>> On Thu, May 07, 2020 at 12:25:34PM +0100, Josh de Kock wrote:
>>>> The NEON hscale function only supports X8 filter sizes and should only
>>>> be selected when these are being used.
>>>>
>>>> Signed-off-by: Josh de Kock <josh@itanimul.li>
>>>> ---
>>>>    libswscale/aarch64/swscale.c | 5 ++++-
>>>>    1 file changed, 4 insertions(+), 1 deletion(-)
>>>>
>>>> diff --git a/libswscale/aarch64/swscale.c b/libswscale/aarch64/swscale.c
>>>> index 54a3beabe8..eecbea88ca 100644
>>>> --- a/libswscale/aarch64/swscale.c
>>>> +++ b/libswscale/aarch64/swscale.c
>>>> @@ -34,7 +34,10 @@ av_cold void ff_sws_init_swscale_aarch64(SwsContext *c)
>>>>        int cpu_flags = av_get_cpu_flags();
>>>>
>>>>        if (have_neon(cpu_flags)) {
>>>> -        if (c->srcBpc == 8 && c->dstBpc <= 14) {
>>>> +        if (c->srcBpc == 8 && c->dstBpc <= 14 &&
>>>> +            (c->hLumFilterSize % 8) == 0 &&
>>>> +            (c->hChrFilterSize % 8) == 0)
>>>> +        {
>>>>                c->hyScale = c->hcScale = ff_hscale_8_to_15_neon;
>>>>            }
>>>
>>> isnt filterAlign set to 8 when neon is available ?
>>
>> Discussed on IRC. Pushed with set.
> 
> Could you give a very short explanation on why the comment
> was not relevant?
> 

At the moment filterAlign is set to 8 but in the future when extra NEON 
assembly for specific sizes is added they will need to have checks here too.

The immediate use-case for this change is making the hscale checkasm
test easier and without NEON specific edge-cases (x86 already has these
guards).

Michael then said the extra check was fine here. The checkasm test 
breaks on altivec (I was unable to test this) but I'm discussing with 
Martin if there's a better way to do this, these checks are likely to be 
reverted with a different solution implemented.
diff mbox series

Patch

diff --git a/libswscale/aarch64/swscale.c b/libswscale/aarch64/swscale.c
index 54a3beabe8..eecbea88ca 100644
--- a/libswscale/aarch64/swscale.c
+++ b/libswscale/aarch64/swscale.c
@@ -34,7 +34,10 @@  av_cold void ff_sws_init_swscale_aarch64(SwsContext *c)
     int cpu_flags = av_get_cpu_flags();
 
     if (have_neon(cpu_flags)) {
-        if (c->srcBpc == 8 && c->dstBpc <= 14) {
+        if (c->srcBpc == 8 && c->dstBpc <= 14 &&
+            (c->hLumFilterSize % 8) == 0 &&
+            (c->hChrFilterSize % 8) == 0)
+        {
             c->hyScale = c->hcScale = ff_hscale_8_to_15_neon;
         }
         if (c->dstBpc == 8) {