Message ID | 20210716134453.1126957-1-alankelly@google.com |
---|---|
State | New |
Headers | show |
Series | [FFmpeg-devel,1/2] libavutil/cpu: Adds fast gather detection. | expand |
Context | Check | Description |
---|---|---|
andriy/x86_make | success | Make finished |
andriy/x86_make_fate | success | Make fate finished |
andriy/PPC64_make | success | Make finished |
andriy/PPC64_make_fate | success | Make fate finished |
On 7/16/2021 10:44 AM, Alan Kelly wrote: > Broadwell and later and Zen3 and later have fast gather instructions. > --- > Haswell is now excluded from EXTERNAL_AVX2_FAST as discussed in the > email thread. I was very explicit about this not being ok. We're not disabling all ymm usage for Haswell just for one or two swscale functions using gathers. Lets go with Lynne's latest suggestion and not change the flags at all and use gathers on Haswell, same as other arches, by looking at the AVX2_FAST flag. > libavutil/cpu.h | 1 + > libavutil/x86/cpu.c | 11 ++++++++++- > 2 files changed, 11 insertions(+), 1 deletion(-) > > diff --git a/libavutil/cpu.h b/libavutil/cpu.h > index c069076439..ec3073d021 100644 > --- a/libavutil/cpu.h > +++ b/libavutil/cpu.h > @@ -113,6 +113,7 @@ void av_force_cpu_count(int count); > * av_set_cpu_flags_mask(), then this function will behave as if AVX is not > * present. > */ > + > size_t av_cpu_max_align(void); > > #endif /* AVUTIL_CPU_H */ > diff --git a/libavutil/x86/cpu.c b/libavutil/x86/cpu.c > index bcd41a50a2..158e2170c4 100644 > --- a/libavutil/x86/cpu.c > +++ b/libavutil/x86/cpu.c > @@ -146,8 +146,17 @@ int ff_get_cpu_flags_x86(void) > if (max_std_level >= 7) { > cpuid(7, eax, ebx, ecx, edx); > #if HAVE_AVX2 > - if ((rval & AV_CPU_FLAG_AVX) && (ebx & 0x00000020)) > + if ((rval & AV_CPU_FLAG_AVX) && (ebx & 0x00000020)){ > rval |= AV_CPU_FLAG_AVX2; > + > + cpuid(1, eax, ebx, ecx, std_caps); > + family = ((eax >> 8) & 0xf) + ((eax >> 20) & 0xff); > + model = ((eax >> 4) & 0xf) + ((eax >> 12) & 0xf0); > + // Haswell and earlier has slow gather > + if(family == 6 && model < 70) > + rval |= AV_CPU_FLAG_AVXSLOW; > + } > + > #if HAVE_AVX512 /* F, CD, BW, DQ, VL */ > if ((xcr0_lo & 0xe0) == 0xe0) { /* OPMASK/ZMM state */ > if ((rval & AV_CPU_FLAG_AVX2) && (ebx & 0xd0030000) == 0xd0030000) >
On Fri, Jul 16, 2021 at 4:02 PM James Almer <jamrial@gmail.com> wrote: > On 7/16/2021 10:44 AM, Alan Kelly wrote: > > Broadwell and later and Zen3 and later have fast gather instructions. > > --- > > Haswell is now excluded from EXTERNAL_AVX2_FAST as discussed in the > > email thread. > > I was very explicit about this not being ok. We're not disabling all ymm > usage for Haswell just for one or two swscale functions using gathers. > > Lets go with Lynne's latest suggestion and not change the flags at all > and use gathers on Haswell, same as other arches, by looking at the > AVX2_FAST flag. > > > libavutil/cpu.h | 1 + > > libavutil/x86/cpu.c | 11 ++++++++++- > > 2 files changed, 11 insertions(+), 1 deletion(-) > > > > diff --git a/libavutil/cpu.h b/libavutil/cpu.h > > index c069076439..ec3073d021 100644 > > --- a/libavutil/cpu.h > > +++ b/libavutil/cpu.h > > @@ -113,6 +113,7 @@ void av_force_cpu_count(int count); > > * av_set_cpu_flags_mask(), then this function will behave as if AVX > is not > > * present. > > */ > > + > > size_t av_cpu_max_align(void); > > > > #endif /* AVUTIL_CPU_H */ > > diff --git a/libavutil/x86/cpu.c b/libavutil/x86/cpu.c > > index bcd41a50a2..158e2170c4 100644 > > --- a/libavutil/x86/cpu.c > > +++ b/libavutil/x86/cpu.c > > @@ -146,8 +146,17 @@ int ff_get_cpu_flags_x86(void) > > if (max_std_level >= 7) { > > cpuid(7, eax, ebx, ecx, edx); > > #if HAVE_AVX2 > > - if ((rval & AV_CPU_FLAG_AVX) && (ebx & 0x00000020)) > > + if ((rval & AV_CPU_FLAG_AVX) && (ebx & 0x00000020)){ > > rval |= AV_CPU_FLAG_AVX2; > > + > > + cpuid(1, eax, ebx, ecx, std_caps); > > + family = ((eax >> 8) & 0xf) + ((eax >> 20) & 0xff); > > + model = ((eax >> 4) & 0xf) + ((eax >> 12) & 0xf0); > > + // Haswell and earlier has slow gather > > + if(family == 6 && model < 70) > > + rval |= AV_CPU_FLAG_AVXSLOW; > > + } > > + > > #if HAVE_AVX512 /* F, CD, BW, DQ, VL */ > > if ((xcr0_lo & 0xe0) == 0xe0) { /* OPMASK/ZMM state */ > > if ((rval & AV_CPU_FLAG_AVX2) && (ebx & 0xd0030000) == > 0xd0030000) > > > > _______________________________________________ > ffmpeg-devel mailing list > ffmpeg-devel@ffmpeg.org > https://ffmpeg.org/mailman/listinfo/ffmpeg-devel > > To unsubscribe, visit link above, or email > ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe". > OK, apologies for the misunderstanding. In that case part 1 of this patch is not required. Part two remains valid with the function protected by EXTERNAL_AVX2_FAST. Should part 2 be re-submitted as a standalone patch or is it OK as is?
On 7/16/2021 11:46 AM, Alan Kelly wrote: > On Fri, Jul 16, 2021 at 4:02 PM James Almer <jamrial@gmail.com> wrote: > >> On 7/16/2021 10:44 AM, Alan Kelly wrote: >>> Broadwell and later and Zen3 and later have fast gather instructions. >>> --- >>> Haswell is now excluded from EXTERNAL_AVX2_FAST as discussed in the >>> email thread. >> >> I was very explicit about this not being ok. We're not disabling all ymm >> usage for Haswell just for one or two swscale functions using gathers. >> >> Lets go with Lynne's latest suggestion and not change the flags at all >> and use gathers on Haswell, same as other arches, by looking at the >> AVX2_FAST flag. >> >>> libavutil/cpu.h | 1 + >>> libavutil/x86/cpu.c | 11 ++++++++++- >>> 2 files changed, 11 insertions(+), 1 deletion(-) >>> >>> diff --git a/libavutil/cpu.h b/libavutil/cpu.h >>> index c069076439..ec3073d021 100644 >>> --- a/libavutil/cpu.h >>> +++ b/libavutil/cpu.h >>> @@ -113,6 +113,7 @@ void av_force_cpu_count(int count); >>> * av_set_cpu_flags_mask(), then this function will behave as if AVX >> is not >>> * present. >>> */ >>> + >>> size_t av_cpu_max_align(void); >>> >>> #endif /* AVUTIL_CPU_H */ >>> diff --git a/libavutil/x86/cpu.c b/libavutil/x86/cpu.c >>> index bcd41a50a2..158e2170c4 100644 >>> --- a/libavutil/x86/cpu.c >>> +++ b/libavutil/x86/cpu.c >>> @@ -146,8 +146,17 @@ int ff_get_cpu_flags_x86(void) >>> if (max_std_level >= 7) { >>> cpuid(7, eax, ebx, ecx, edx); >>> #if HAVE_AVX2 >>> - if ((rval & AV_CPU_FLAG_AVX) && (ebx & 0x00000020)) >>> + if ((rval & AV_CPU_FLAG_AVX) && (ebx & 0x00000020)){ >>> rval |= AV_CPU_FLAG_AVX2; >>> + >>> + cpuid(1, eax, ebx, ecx, std_caps); >>> + family = ((eax >> 8) & 0xf) + ((eax >> 20) & 0xff); >>> + model = ((eax >> 4) & 0xf) + ((eax >> 12) & 0xf0); >>> + // Haswell and earlier has slow gather >>> + if(family == 6 && model < 70) >>> + rval |= AV_CPU_FLAG_AVXSLOW; >>> + } >>> + >>> #if HAVE_AVX512 /* F, CD, BW, DQ, VL */ >>> if ((xcr0_lo & 0xe0) == 0xe0) { /* OPMASK/ZMM state */ >>> if ((rval & AV_CPU_FLAG_AVX2) && (ebx & 0xd0030000) == >> 0xd0030000) >>> >> >> _______________________________________________ >> ffmpeg-devel mailing list >> ffmpeg-devel@ffmpeg.org >> https://ffmpeg.org/mailman/listinfo/ffmpeg-devel >> >> To unsubscribe, visit link above, or email >> ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe". >> > > OK, apologies for the misunderstanding. In that case part 1 of this patch > is not required. Part two remains valid with the function protected by > EXTERNAL_AVX2_FAST. Should part 2 be re-submitted as a standalone patch or > is it OK as is? It's ok as is. Thanks.
diff --git a/libavutil/cpu.h b/libavutil/cpu.h index c069076439..ec3073d021 100644 --- a/libavutil/cpu.h +++ b/libavutil/cpu.h @@ -113,6 +113,7 @@ void av_force_cpu_count(int count); * av_set_cpu_flags_mask(), then this function will behave as if AVX is not * present. */ + size_t av_cpu_max_align(void); #endif /* AVUTIL_CPU_H */ diff --git a/libavutil/x86/cpu.c b/libavutil/x86/cpu.c index bcd41a50a2..158e2170c4 100644 --- a/libavutil/x86/cpu.c +++ b/libavutil/x86/cpu.c @@ -146,8 +146,17 @@ int ff_get_cpu_flags_x86(void) if (max_std_level >= 7) { cpuid(7, eax, ebx, ecx, edx); #if HAVE_AVX2 - if ((rval & AV_CPU_FLAG_AVX) && (ebx & 0x00000020)) + if ((rval & AV_CPU_FLAG_AVX) && (ebx & 0x00000020)){ rval |= AV_CPU_FLAG_AVX2; + + cpuid(1, eax, ebx, ecx, std_caps); + family = ((eax >> 8) & 0xf) + ((eax >> 20) & 0xff); + model = ((eax >> 4) & 0xf) + ((eax >> 12) & 0xf0); + // Haswell and earlier has slow gather + if(family == 6 && model < 70) + rval |= AV_CPU_FLAG_AVXSLOW; + } + #if HAVE_AVX512 /* F, CD, BW, DQ, VL */ if ((xcr0_lo & 0xe0) == 0xe0) { /* OPMASK/ZMM state */ if ((rval & AV_CPU_FLAG_AVX2) && (ebx & 0xd0030000) == 0xd0030000)