Message ID | 20171109115837.32618-9-jdarnley@obe.tv |
---|---|
State | New |
Headers | show |
2017-11-10 22:13 GMT+01:00 James Darnley <jdarnley@obe.tv>: > On 2017-11-10 14:32, James Darnley wrote: > > I mentioned previously that using ZMM registers will cause the CPU to > > reduce its frequency. > > > > Gramner said on IRC that a user should spend 20-30% of time in > > AVX-512/ZMM code for it to be a net gain in speed. > > From ffmpeg-devel IRC on 2017-10-26 > >> https://lists.ffmpeg.org/pipermail/ffmpeg-devel-irc/ > 2017-October/004622.html > >> [18:49:26 CEST] <Gramner> J_Darnley: be aware that using zmm registers > induces significant frequency drops which reduces performance of everything > else, so if you want to use 512-bit vectors you better go all in on it to > make up for it. you probably want to spend at least 20-30% of overall > runtime in avx-512 code > >> [18:50:00 CEST] <Gramner> the alternative is to stay in 256-bit mode > and just leverage new instructions and opmasks > > > > This means any cycles you might save by using longer registers, fewer > > instructions, better instructions, whatever, will be lost because the > > frequency drops meaning it takes longer to execute overall. > > Some details about this can be found in one of Intel's documents: IntelĀ® > 64 and IA-32 Architectures Optimization Reference Manual > Order Number: 248966-038 > October 2017 > > https://software.intel.com/sites/default/files/managed/ > 9e/bc/64-ia-32-architectures-optimization-manual.pdf > Specifically section 15.26 "SKYLAKE SERVER POWER MANAGEMENT" > > Earlier on the ffmpeg-devel IRC channel I posted a link to Cloudflare's > blog in which they discuss the effects of running just a few (my words) > AVX-512/ZMM instructions. > > https://blog.cloudflare.com/on-the-dangers-of-intels-frequency-scaling/ > > In the worst cases on some of the new processors the frequency drop can > be 1GHz. In Cloudflare's case just spending about 2.5% of time in a > cryptography function using AVX-512 was causing a 10% drop in their > overall performance (requests served per second). > > After seeing this and the discussion on IRC I won't commit any of the > function patches. The functions are not very impressive and are likely > to make everything else slower. > > The IRC log should appear at the link below. > > https://lists.ffmpeg.org/pipermail/ffmpeg-devel-irc/ > 2017-November/004651.html > > > Thanks for the details explanations. Martin
On 2017-11-10 22:13, James Darnley wrote: > The IRC log should appear at the link below. >> https://lists.ffmpeg.org/pipermail/ffmpeg-devel-irc/2017-November/004651.html Of course when I try to predict what number an email will get based on the past few it ends up being out of order. The ffmpeg-devel log I was referring to is here: > https://lists.ffmpeg.org/pipermail/ffmpeg-devel-irc/2017-November/004652.html
diff --git a/libavcodec/x86/v210enc.asm b/libavcodec/x86/v210enc.asm index 965f2bea3c..5068af27f8 100644 --- a/libavcodec/x86/v210enc.asm +++ b/libavcodec/x86/v210enc.asm @@ -103,6 +103,11 @@ INIT_YMM avx2 v210_planar_pack_10 %endif +%if HAVE_AVX512_EXTERNAL +INIT_YMM avx512 +v210_planar_pack_10 +%endif + %macro v210_planar_pack_8 0 ; v210_planar_pack_8(const uint8_t *y, const uint8_t *u, const uint8_t *v, uint8_t *dst, ptrdiff_t width) diff --git a/libavcodec/x86/v210enc_init.c b/libavcodec/x86/v210enc_init.c index e997b4b67a..e8aac373a0 100644 --- a/libavcodec/x86/v210enc_init.c +++ b/libavcodec/x86/v210enc_init.c @@ -32,6 +32,9 @@ void ff_v210_planar_pack_10_ssse3(const uint16_t *y, const uint16_t *u, void ff_v210_planar_pack_10_avx2(const uint16_t *y, const uint16_t *u, const uint16_t *v, uint8_t *dst, ptrdiff_t width); +void ff_v210_planar_pack_10_avx512(const uint16_t *y, const uint16_t *u, + const uint16_t *v, uint8_t *dst, + ptrdiff_t width); av_cold void ff_v210enc_init_x86(V210EncContext *s) { @@ -51,4 +54,8 @@ av_cold void ff_v210enc_init_x86(V210EncContext *s) s->sample_factor_10 = 2; s->pack_line_10 = ff_v210_planar_pack_10_avx2; } + + if (EXTERNAL_AVX512(cpu_flags)) { + s->pack_line_10 = ff_v210_planar_pack_10_avx512; + } }