diff mbox series

[FFmpeg-devel,1/2] libavutil/cpu: Adds fast gather detection.

Message ID 20210716134453.1126957-1-alankelly@google.com
State New
Headers show
Series [FFmpeg-devel,1/2] libavutil/cpu: Adds fast gather detection. | expand

Checks

Context Check Description
andriy/x86_make success Make finished
andriy/x86_make_fate success Make fate finished
andriy/PPC64_make success Make finished
andriy/PPC64_make_fate success Make fate finished

Commit Message

Alan Kelly July 16, 2021, 1:44 p.m. UTC
Broadwell and later and Zen3 and later have fast gather instructions.
---
 Haswell is now excluded from EXTERNAL_AVX2_FAST as discussed in the
 email thread.
 libavutil/cpu.h     |  1 +
 libavutil/x86/cpu.c | 11 ++++++++++-
 2 files changed, 11 insertions(+), 1 deletion(-)

Comments

James Almer July 16, 2021, 2:02 p.m. UTC | #1
On 7/16/2021 10:44 AM, Alan Kelly wrote:
> Broadwell and later and Zen3 and later have fast gather instructions.
> ---
>   Haswell is now excluded from EXTERNAL_AVX2_FAST as discussed in the
>   email thread.

I was very explicit about this not being ok. We're not disabling all ymm 
usage for Haswell just for one or two swscale functions using gathers.

Lets go with Lynne's latest suggestion and not change the flags at all 
and use gathers on Haswell, same as other arches, by looking at the 
AVX2_FAST flag.

>   libavutil/cpu.h     |  1 +
>   libavutil/x86/cpu.c | 11 ++++++++++-
>   2 files changed, 11 insertions(+), 1 deletion(-)
> 
> diff --git a/libavutil/cpu.h b/libavutil/cpu.h
> index c069076439..ec3073d021 100644
> --- a/libavutil/cpu.h
> +++ b/libavutil/cpu.h
> @@ -113,6 +113,7 @@ void av_force_cpu_count(int count);
>    *  av_set_cpu_flags_mask(), then this function will behave as if AVX is not
>    *  present.
>    */
> +
>   size_t av_cpu_max_align(void);
>   
>   #endif /* AVUTIL_CPU_H */
> diff --git a/libavutil/x86/cpu.c b/libavutil/x86/cpu.c
> index bcd41a50a2..158e2170c4 100644
> --- a/libavutil/x86/cpu.c
> +++ b/libavutil/x86/cpu.c
> @@ -146,8 +146,17 @@ int ff_get_cpu_flags_x86(void)
>       if (max_std_level >= 7) {
>           cpuid(7, eax, ebx, ecx, edx);
>   #if HAVE_AVX2
> -        if ((rval & AV_CPU_FLAG_AVX) && (ebx & 0x00000020))
> +        if ((rval & AV_CPU_FLAG_AVX) && (ebx & 0x00000020)){
>               rval |= AV_CPU_FLAG_AVX2;
> +
> +            cpuid(1, eax, ebx, ecx, std_caps);
> +            family = ((eax >> 8) & 0xf) + ((eax >> 20) & 0xff);
> +            model  = ((eax >> 4) & 0xf) + ((eax >> 12) & 0xf0);
> +            // Haswell and earlier has slow gather
> +            if(family == 6 && model < 70)
> +                rval |= AV_CPU_FLAG_AVXSLOW;
> +        }
> +
>   #if HAVE_AVX512 /* F, CD, BW, DQ, VL */
>           if ((xcr0_lo & 0xe0) == 0xe0) { /* OPMASK/ZMM state */
>               if ((rval & AV_CPU_FLAG_AVX2) && (ebx & 0xd0030000) == 0xd0030000)
>
Alan Kelly July 16, 2021, 2:46 p.m. UTC | #2
On Fri, Jul 16, 2021 at 4:02 PM James Almer <jamrial@gmail.com> wrote:

> On 7/16/2021 10:44 AM, Alan Kelly wrote:
> > Broadwell and later and Zen3 and later have fast gather instructions.
> > ---
> >   Haswell is now excluded from EXTERNAL_AVX2_FAST as discussed in the
> >   email thread.
>
> I was very explicit about this not being ok. We're not disabling all ymm
> usage for Haswell just for one or two swscale functions using gathers.
>
> Lets go with Lynne's latest suggestion and not change the flags at all
> and use gathers on Haswell, same as other arches, by looking at the
> AVX2_FAST flag.
>
> >   libavutil/cpu.h     |  1 +
> >   libavutil/x86/cpu.c | 11 ++++++++++-
> >   2 files changed, 11 insertions(+), 1 deletion(-)
> >
> > diff --git a/libavutil/cpu.h b/libavutil/cpu.h
> > index c069076439..ec3073d021 100644
> > --- a/libavutil/cpu.h
> > +++ b/libavutil/cpu.h
> > @@ -113,6 +113,7 @@ void av_force_cpu_count(int count);
> >    *  av_set_cpu_flags_mask(), then this function will behave as if AVX
> is not
> >    *  present.
> >    */
> > +
> >   size_t av_cpu_max_align(void);
> >
> >   #endif /* AVUTIL_CPU_H */
> > diff --git a/libavutil/x86/cpu.c b/libavutil/x86/cpu.c
> > index bcd41a50a2..158e2170c4 100644
> > --- a/libavutil/x86/cpu.c
> > +++ b/libavutil/x86/cpu.c
> > @@ -146,8 +146,17 @@ int ff_get_cpu_flags_x86(void)
> >       if (max_std_level >= 7) {
> >           cpuid(7, eax, ebx, ecx, edx);
> >   #if HAVE_AVX2
> > -        if ((rval & AV_CPU_FLAG_AVX) && (ebx & 0x00000020))
> > +        if ((rval & AV_CPU_FLAG_AVX) && (ebx & 0x00000020)){
> >               rval |= AV_CPU_FLAG_AVX2;
> > +
> > +            cpuid(1, eax, ebx, ecx, std_caps);
> > +            family = ((eax >> 8) & 0xf) + ((eax >> 20) & 0xff);
> > +            model  = ((eax >> 4) & 0xf) + ((eax >> 12) & 0xf0);
> > +            // Haswell and earlier has slow gather
> > +            if(family == 6 && model < 70)
> > +                rval |= AV_CPU_FLAG_AVXSLOW;
> > +        }
> > +
> >   #if HAVE_AVX512 /* F, CD, BW, DQ, VL */
> >           if ((xcr0_lo & 0xe0) == 0xe0) { /* OPMASK/ZMM state */
> >               if ((rval & AV_CPU_FLAG_AVX2) && (ebx & 0xd0030000) ==
> 0xd0030000)
> >
>
> _______________________________________________
> ffmpeg-devel mailing list
> ffmpeg-devel@ffmpeg.org
> https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
>
> To unsubscribe, visit link above, or email
> ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".
>

OK, apologies for the misunderstanding. In that case part 1 of this patch
is not required. Part two remains valid with the function protected by
EXTERNAL_AVX2_FAST. Should part 2 be re-submitted as a standalone patch or
is it OK as is?
James Almer July 16, 2021, 3:08 p.m. UTC | #3
On 7/16/2021 11:46 AM, Alan Kelly wrote:
> On Fri, Jul 16, 2021 at 4:02 PM James Almer <jamrial@gmail.com> wrote:
> 
>> On 7/16/2021 10:44 AM, Alan Kelly wrote:
>>> Broadwell and later and Zen3 and later have fast gather instructions.
>>> ---
>>>    Haswell is now excluded from EXTERNAL_AVX2_FAST as discussed in the
>>>    email thread.
>>
>> I was very explicit about this not being ok. We're not disabling all ymm
>> usage for Haswell just for one or two swscale functions using gathers.
>>
>> Lets go with Lynne's latest suggestion and not change the flags at all
>> and use gathers on Haswell, same as other arches, by looking at the
>> AVX2_FAST flag.
>>
>>>    libavutil/cpu.h     |  1 +
>>>    libavutil/x86/cpu.c | 11 ++++++++++-
>>>    2 files changed, 11 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/libavutil/cpu.h b/libavutil/cpu.h
>>> index c069076439..ec3073d021 100644
>>> --- a/libavutil/cpu.h
>>> +++ b/libavutil/cpu.h
>>> @@ -113,6 +113,7 @@ void av_force_cpu_count(int count);
>>>     *  av_set_cpu_flags_mask(), then this function will behave as if AVX
>> is not
>>>     *  present.
>>>     */
>>> +
>>>    size_t av_cpu_max_align(void);
>>>
>>>    #endif /* AVUTIL_CPU_H */
>>> diff --git a/libavutil/x86/cpu.c b/libavutil/x86/cpu.c
>>> index bcd41a50a2..158e2170c4 100644
>>> --- a/libavutil/x86/cpu.c
>>> +++ b/libavutil/x86/cpu.c
>>> @@ -146,8 +146,17 @@ int ff_get_cpu_flags_x86(void)
>>>        if (max_std_level >= 7) {
>>>            cpuid(7, eax, ebx, ecx, edx);
>>>    #if HAVE_AVX2
>>> -        if ((rval & AV_CPU_FLAG_AVX) && (ebx & 0x00000020))
>>> +        if ((rval & AV_CPU_FLAG_AVX) && (ebx & 0x00000020)){
>>>                rval |= AV_CPU_FLAG_AVX2;
>>> +
>>> +            cpuid(1, eax, ebx, ecx, std_caps);
>>> +            family = ((eax >> 8) & 0xf) + ((eax >> 20) & 0xff);
>>> +            model  = ((eax >> 4) & 0xf) + ((eax >> 12) & 0xf0);
>>> +            // Haswell and earlier has slow gather
>>> +            if(family == 6 && model < 70)
>>> +                rval |= AV_CPU_FLAG_AVXSLOW;
>>> +        }
>>> +
>>>    #if HAVE_AVX512 /* F, CD, BW, DQ, VL */
>>>            if ((xcr0_lo & 0xe0) == 0xe0) { /* OPMASK/ZMM state */
>>>                if ((rval & AV_CPU_FLAG_AVX2) && (ebx & 0xd0030000) ==
>> 0xd0030000)
>>>
>>
>> _______________________________________________
>> ffmpeg-devel mailing list
>> ffmpeg-devel@ffmpeg.org
>> https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
>>
>> To unsubscribe, visit link above, or email
>> ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".
>>
> 
> OK, apologies for the misunderstanding. In that case part 1 of this patch
> is not required. Part two remains valid with the function protected by
> EXTERNAL_AVX2_FAST. Should part 2 be re-submitted as a standalone patch or
> is it OK as is?

It's ok as is. Thanks.
diff mbox series

Patch

diff --git a/libavutil/cpu.h b/libavutil/cpu.h
index c069076439..ec3073d021 100644
--- a/libavutil/cpu.h
+++ b/libavutil/cpu.h
@@ -113,6 +113,7 @@  void av_force_cpu_count(int count);
  *  av_set_cpu_flags_mask(), then this function will behave as if AVX is not
  *  present.
  */
+
 size_t av_cpu_max_align(void);
 
 #endif /* AVUTIL_CPU_H */
diff --git a/libavutil/x86/cpu.c b/libavutil/x86/cpu.c
index bcd41a50a2..158e2170c4 100644
--- a/libavutil/x86/cpu.c
+++ b/libavutil/x86/cpu.c
@@ -146,8 +146,17 @@  int ff_get_cpu_flags_x86(void)
     if (max_std_level >= 7) {
         cpuid(7, eax, ebx, ecx, edx);
 #if HAVE_AVX2
-        if ((rval & AV_CPU_FLAG_AVX) && (ebx & 0x00000020))
+        if ((rval & AV_CPU_FLAG_AVX) && (ebx & 0x00000020)){
             rval |= AV_CPU_FLAG_AVX2;
+
+            cpuid(1, eax, ebx, ecx, std_caps);
+            family = ((eax >> 8) & 0xf) + ((eax >> 20) & 0xff);
+            model  = ((eax >> 4) & 0xf) + ((eax >> 12) & 0xf0);
+            // Haswell and earlier has slow gather
+            if(family == 6 && model < 70)
+                rval |= AV_CPU_FLAG_AVXSLOW;
+        }
+
 #if HAVE_AVX512 /* F, CD, BW, DQ, VL */
         if ((xcr0_lo & 0xe0) == 0xe0) { /* OPMASK/ZMM state */
             if ((rval & AV_CPU_FLAG_AVX2) && (ebx & 0xd0030000) == 0xd0030000)