From patchwork Thu Jun 9 23:55:15 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andreas Rheinhardt X-Patchwork-Id: 36131 Delivered-To: ffmpegpatchwork2@gmail.com Received: by 2002:a05:6a20:6914:b0:82:6b11:2509 with SMTP id q20csp656016pzj; Thu, 9 Jun 2022 17:01:29 -0700 (PDT) X-Google-Smtp-Source: ABdhPJztd8rYxYYKlMvM8HaIqTni0WGkV3x6Uy5L8/xtsOwFFBhzbaSj5q1ShexzQnS0Jmq1rpfy X-Received: by 2002:aa7:d058:0:b0:42d:f33f:ad21 with SMTP id n24-20020aa7d058000000b0042df33fad21mr47919658edo.388.1654819289506; Thu, 09 Jun 2022 17:01:29 -0700 (PDT) Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id mp37-20020a1709071b2500b006feb04902a8si5297810ejc.824.2022.06.09.17.01.28; Thu, 09 Jun 2022 17:01:29 -0700 (PDT) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; dkim=neutral (body hash did not verify) header.i=@outlook.com header.s=selector1 header.b=QzM0XiJp; arc=fail (body hash mismatch); spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=QUARANTINE dis=NONE) header.from=outlook.com Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 3706368B926; Fri, 10 Jun 2022 02:57:22 +0300 (EEST) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from EUR04-HE1-obe.outbound.protection.outlook.com (mail-oln040092073083.outbound.protection.outlook.com [40.92.73.83]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id DA99968B918 for ; Fri, 10 Jun 2022 02:57:20 +0300 (EEST) ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=X+N3wU0ayNdWjiDtvw8AEMkxdgrPbhynLLPSmIR1IDGR18IfmA4cBI5JbfXKyMafm82G0ThZSSNKsjtdVC4FEa6u702ib9FctOeXPVDLPaK3KzEo7V2fGBt2Q7C4pGuUjm848Eo4jho+NMTy/97AAKfQopni0f6GvLXcYmWJEYi8N+GAKOTM2TcHngshgws4ZBXb5pi7ovapk6zl0uq7DRWf+MfHecdPnB5zmx7y+3n+QAxyXY6MRHj51eD4oA4P82XODsSpyzq2LAQfe1DxPOSVrYfwKGtjkvedt73khxvMpFNyjUJ/qzv2Wz3riIUA+FVNM7KQ36nwdonqgnzPKA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=va4Nk39vhctx231QYDHz2inDcteFwM+2+mfcx+GiTpk=; b=ByHWvF00RGYHiD6MIzVSasXOrhJksFCwdyQpr7w8LrqvugWUor+bSrGQZ8tdqH3RHp3gXBdcRoO9YewQnfNl8gFGe6dnzBonVCVImj1g0l7yuW0sWnbg9oMVAE3Hxprj4CzMpvD3TeZI48zDIbkgL8wAXKl+tTZOyzGRu/6mYU+C7wtY9eIBNoiC0ojWkzMkqVItqQ2BGGYb71b6+fb6u2qdpFXnKgKMk7Ow/V7EARSq7hs1MSfhrfaIiI8YYEyO/lInR2QOt5vSPrSNSbV9lDIDaT60vl01nM29nTydcz2Fe2HtOP8XUb2kMIopGWGCpYczP3LPbgHfFxYbQrpOvg== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=none; dmarc=none; dkim=none; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=outlook.com; s=selector1; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=va4Nk39vhctx231QYDHz2inDcteFwM+2+mfcx+GiTpk=; b=QzM0XiJp97Q4AjkQR/7LAHwETjwE2QKCSFsRMkppMwScJBUG7Y7yrUdmH4LWQVCxZa0xmb92TSIXrPViKUz5TVtPl7tSr1MHnVq2anyfG3oC7ViPFjs7XzS02HtuZz7MntY0i0GwIkxunmEvzzsbqh9ktSshiKGg4l7FiJNJyl89UawFHaWCwWAclnCQ4e2vqqEOjmaopor1QZE1GMXQDQ6xXkAHZf6jF4zLUQqZ2cDFOYHUBc0noB7IRkHyI9/SqO54di+YtNisrnuXdR37N3WGncsmItTbYNVZO/yxNO8xmguvUvPdH7AxgUkEIFmVIqzDf6HkTDvf9HoHK+SBOQ== Received: from DB6PR0101MB2214.eurprd01.prod.exchangelabs.com (2603:10a6:4:42::27) by DB6PR01MB3862.eurprd01.prod.exchangelabs.com (2603:10a6:6:48::31) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.5314.15; Thu, 9 Jun 2022 23:57:19 +0000 Received: from DB6PR0101MB2214.eurprd01.prod.exchangelabs.com ([fe80::60b9:9f29:40cc:f01c]) by DB6PR0101MB2214.eurprd01.prod.exchangelabs.com ([fe80::60b9:9f29:40cc:f01c%10]) with mapi id 15.20.5332.013; Thu, 9 Jun 2022 23:57:19 +0000 From: Andreas Rheinhardt To: ffmpeg-devel@ffmpeg.org Date: Fri, 10 Jun 2022 01:55:15 +0200 Message-ID: X-Mailer: git-send-email 2.34.1 In-Reply-To: References: X-TMN: [kWR1sGgpDfHoDZHESZJhKrOcSFLtLZV+] X-ClientProxiedBy: AM5PR04CA0005.eurprd04.prod.outlook.com (2603:10a6:206:1::18) To DB6PR0101MB2214.eurprd01.prod.exchangelabs.com (2603:10a6:4:42::27) X-Microsoft-Original-Message-ID: <20220609235523.458689-33-andreas.rheinhardt@outlook.com> MIME-Version: 1.0 X-MS-Exchange-MessageSentRepresentingType: 1 X-MS-PublicTrafficType: Email X-MS-Office365-Filtering-Correlation-Id: 08cbb10e-d1cc-48d9-25df-08da4a73c985 X-MS-Exchange-SLBlob-MailProps: S/btQ8cKWiTijo6adWu98SHbtEySq1lKMY/u9/dCHeuGrjcp4sPm3/48ClZ85alLW4s34KqJoBpECgFDZ90KI0/EN2oz+PktlHal8/oFiE/P3+KzlS1O3z8fzYHeSiRZc7krNsJ7O+l3wQSOfBaWg/JrwAaGMJIN9UDF3EHC6K3vWpClXI+grN3JJ1acxa2dy0mReN86P2XXQ6p3vSc9LyvGOfcqfNzFz5A623bQhK1h7AzJM8R55EQLDbr1e4MuQ90ElJ5T0fPv0864arARjAIngw0ZXA+62jwGQbN3fVsXoWCxC6Z7xUcGZ3VWSoTkxCOlhN6/EN6yQ3lpVtrcUgVDqMSjpCy3gm4IqyACV9TZAfoQOySB3GTffYXAI+CMNYqX3e6nNIzx+rMQ61jILZJM+psRf3cb8ARZStAZnFWe6zTGPbHrdDnJwryiqm5g1VG100PA4a1I+67vR8kHyFTVeMEVPEdPoipnD5d+5HTslfli1E1zAVQbkde4Ujvlc/zANQVH9Sbph8N48SGIoUxE2AwRpNGLCZExxSI6VKIyXnKyidV2HCwyufQIJbHheL3qauqkI5X3+Ebl3Qqu+YjeQ0+dkUrioB085eBH75czrDinl9EAdOJiPx0Nd4esnqL3fDj1cgjvy/jBvpazzEwbBb9kY0ExnXO/WdYzeyoSw7v5MHUeWiM6epq3lAUWVdfmOYeYwXw19wcKXsLumqUQ6PUmyeMchDByFsMKcWN3Ff5PIRTKgJcGNcXijxYdz5vjtYQt2nA= X-MS-TrafficTypeDiagnostic: DB6PR01MB3862:EE_ X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: vippA7G3rbayZa5+BVve4vZ9+C1V/PMK+ehT3VAWpwl2mgu2q8vM59528IZCmHXeijD7bNVTVr8IFiGEiYjT64u+qr91aKjKod8iwIA2wjQF5rVnVayrApp1rwlmQopknOzfo77hR9FAzhp6IJbEywmrKNjy3EhUgrFEr9abgkcvc1ie+unXah7m+FY9EmBqAz0xDxRw1EAHjxZGlLtLEwh2xuhBywZvL/1Fe6+t0Z7nTBmvm+zz9sefiHgYy2mR5MFn+exHkkiWhQv7AvJjJGGsRnhFcYk5q3bmbZs5hgBsXkaLiHoB8aBjyGphFD1kxMDf/27PWBsm0+PdovJ8MYWWWGKGJjZrt52NyjnTRZnDaldyc3BMvQytFb3SR1mM/HrlLvucB9DYvzFbj8shPPWVCKBAs6Zmbd7av1gKq7Hx0K8gdijvHnQERIZwlVa6blKFe3GWV4PuajIkrV2qdrR2J53I9ljbGJxOaZfUqr9zd7P1P/MjIzeASilPL/dobTl/u9VfEyRnyPks4Ui33YkTMv2jMcW/Kl0w9wrT/l3+CyXCHx8uYVyzQHgSMdRb3/sIDipL5fL3kJZnAxvNkg== X-MS-Exchange-AntiSpam-MessageData-ChunkCount: 1 X-MS-Exchange-AntiSpam-MessageData-0: Evu5BfSjrhJj7oYwsn7OVmN9uMRjfQ69U+TVggM0dTnwpKiTRk9I5uUJAFZsBQO0cAPEFltoTLwvSsdeiybpgsCpbAW0QUEjwgJ2vkgUAZuq9zjNfcnp8Sf8r+0NWIM4RP1zi0W8AQsAkQIOWME5jDvYChsCnxMVusDL5gCd6bcJCWUjLgzXye8blcZrEblfdH+MSkAs7wN3aD6BJmANERkbV/JwZB45sBf6S+5U3AFFZAtkEa5708l+PDYKxd97Vd2p5g1nkzci3ZvmaoQDUetVzrb5EizayOESUsNtW+dx+kzMyc+mP24+jbV9GpUPrV0MadGw9hjbyUFdp6pfowzw6HH4Ix4xo2t0jHXs4gw7G1JD6Z6DRwZc/7mSpTVj3wZEUi15r9CfY2fdXg9Tespwxhl0ORfm6gCTyfb5YGdaaSRhMUUhucLx1DNEKxdLr/b9iFwYdh5L4Fiq4mvdSDM4O+GqaP0NZhE15alVHNr9DiTcJjmjSeurmenSA4KXuL0FoayO8LIJkfweWGEff65RtAspGKnFVgaChVaUjAo0mpfAq2/ou8xwDQkvEWonreZBYStxcR9xwECfrioVtOxJMQ+cW3JFZ19Md6plRDkUawL6kDTX+R/4gALvjcXc1eacHK+VBepCz3q5ACdSMpLeKdsK57RlIHTxnncdqPUN4OrKuKl/hi3pPrBkghkrvO/8lzmRP8hChSe7tbEZjneVEZxgRttA1zWWdhokRMNXnAXM1dGPhAtpucHHqiscLG9kgXVe0VTOPGI4+X0jWdwCMF2MrQsG+ITKBcqG9rDqvk39eDy42GZyk55vZeUh4MM2KaOiWhgUlSJ52tDnGUOBw8x1F4Ldy6n3Ra18EN1AQc3t/1PRgLdhaP0IJJdxJda22zFz2DlMEbijQvL3T5GlB5Ah7HfH9ShFD0inzI5QJS8tfsudW88SD4uo57nYYomEK2YLUpOEyc96nzuTlUUTMY8jbBa+sCkxvKRO5/P50xYBmt1ukJUQM8fhQl2bp1UJm796+JuFF+f6Z5h8tUzbIVNYMRZRWAM4p46q9uAQz9GG9LoKsY0lKzaPCURpFs7NlI1xc4p2lw0SN57Rq/m2YpJr/TK/1Yh2xy3SW++Rs4APRjPr+9PIWaID3V6cejz5XtDNEBjHjE4s4GxZF5fql4siKkKNOY3pJmsQ0VG7tVyi2YkXN2uwuq9B6gyX5Kk4HTODB77gA2CWxLowS7sHZJLn7vvpftIizuHFmo0tRkWM02j6v78buvZ1emWmaGbT4bC8rckenM1rnjPVGd4No9rl2srF6VGEZv8CXQs2OqK0yHZSiiCPYJ5Opd85SnIyKx5wAFlH4cZ6ZwbvE9L7zFK7wvCoHVKAYmkb27ZKm5k8YHEKkGS7qEigPAYg0DyCp5A3ZFWsTFyeokm3tK61ZL+vRxzEBPI0TMRwwop8z38Uc/GdLXltD8c+T5MIr+IQ3kCsrbQRrkhSh1kdGA== X-OriginatorOrg: outlook.com X-MS-Exchange-CrossTenant-Network-Message-Id: 08cbb10e-d1cc-48d9-25df-08da4a73c985 X-MS-Exchange-CrossTenant-AuthSource: DB6PR0101MB2214.eurprd01.prod.exchangelabs.com X-MS-Exchange-CrossTenant-AuthAs: Internal X-MS-Exchange-CrossTenant-OriginalArrivalTime: 09 Jun 2022 23:57:19.1487 (UTC) X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted X-MS-Exchange-CrossTenant-Id: 84df9e7f-e9f6-40af-b435-aaaaaaaaaaaa X-MS-Exchange-CrossTenant-RMS-PersistedConsumerOrg: 00000000-0000-0000-0000-000000000000 X-MS-Exchange-Transport-CrossTenantHeadersStamped: DB6PR01MB3862 Subject: [FFmpeg-devel] [PATCH 33/41] avcodec/x86/h264_qpel: Disable overridden functions on x64 X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Cc: Andreas Rheinhardt Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" X-TUID: lrfevL4blv50 x64 always has MMX, MMXEXT, SSE and SSE2 and this means that some functions for MMX, MMXEXT, SSE and 3dnow are always overridden by other functions (unless one e.g. explicitly disables SSE2). This commit therefore disables several MMXEXT functions (that are overridden by SSE2 functions) at compile-time for x64. Notice that some 10-bit SSE2 functions are overridden by sse2_cache64 functions in the same code block. This is suboptimal and the functions that are overridden should either be removed or the sse2_cache64 functions be put behind suitable checks. This commit does neither. Signed-off-by: Andreas Rheinhardt --- I would love to get input on what to do with these sse2_cache64 functions. If no one says anything, I will send a patch that retains the current behaviour and removes the functions overridden by the sse2_cache64 functions. libavcodec/x86/h264_qpel.c | 44 +++++++++++++++++++++---------- libavcodec/x86/h264_qpel_8bit.asm | 4 +++ 2 files changed, 34 insertions(+), 14 deletions(-) diff --git a/libavcodec/x86/h264_qpel.c b/libavcodec/x86/h264_qpel.c index fd1070247b..cb5f8a126c 100644 --- a/libavcodec/x86/h264_qpel.c +++ b/libavcodec/x86/h264_qpel.c @@ -236,7 +236,11 @@ static av_always_inline void ff_ ## OPNAME ## h264_qpel16_hv_lowpass_ ## MMX(uin #define ff_put_h264_qpel8or16_hv2_lowpass_sse2 ff_put_h264_qpel8or16_hv2_lowpass_mmxext #define ff_avg_h264_qpel8or16_hv2_lowpass_sse2 ff_avg_h264_qpel8or16_hv2_lowpass_mmxext -#define H264_MC(OPNAME, SIZE, MMX, ALIGN) \ +#define H264_MC_C_H(OPNAME, SIZE, MMX, ALIGN) \ +H264_MC_C(OPNAME, SIZE, MMX, ALIGN)\ +H264_MC_H(OPNAME, SIZE, MMX, ALIGN)\ + +#define H264_MC_C_V_H_HV(OPNAME, SIZE, MMX, ALIGN) \ H264_MC_C(OPNAME, SIZE, MMX, ALIGN)\ H264_MC_V(OPNAME, SIZE, MMX, ALIGN)\ H264_MC_H(OPNAME, SIZE, MMX, ALIGN)\ @@ -372,13 +376,9 @@ static void OPNAME ## h264_qpel ## SIZE ## _mc32_ ## MMX(uint8_t *dst, const uin ff_ ## OPNAME ## pixels ## SIZE ## _l2_shift5_mmxext(dst, halfV+3, halfHV, stride, SIZE, SIZE);\ }\ -#define H264_MC_4816(MMX)\ -H264_MC(put_, 4, MMX, 8)\ -H264_MC(put_, 8, MMX, 8)\ -H264_MC(put_, 16,MMX, 8)\ -H264_MC(avg_, 4, MMX, 8)\ -H264_MC(avg_, 8, MMX, 8)\ -H264_MC(avg_, 16,MMX, 8)\ +#define H264_MC(QPEL, SIZE, MMX, ALIGN)\ +QPEL(put_, SIZE, MMX, ALIGN) \ +QPEL(avg_, SIZE, MMX, ALIGN) \ #define H264_MC_816(QPEL, XMM)\ QPEL(put_, 8, XMM, 16)\ @@ -397,7 +397,14 @@ QPEL_H264_H_XMM(avg_,AVG_MMXEXT_OP, ssse3) QPEL_H264_HV_XMM(put_, PUT_OP, ssse3) QPEL_H264_HV_XMM(avg_,AVG_MMXEXT_OP, ssse3) -H264_MC_4816(mmxext) +H264_MC(H264_MC_C_V_H_HV, 4, mmxext, 8) +#if ARCH_X86_32 +H264_MC(H264_MC_C_V_H_HV, 8, mmxext, 8) +H264_MC(H264_MC_C_V_H_HV, 16, mmxext, 8) +#else +H264_MC(H264_MC_C_H, 8, mmxext, 8) +H264_MC(H264_MC_C_H, 16, mmxext, 8) +#endif H264_MC_816(H264_MC_V, sse2) H264_MC_816(H264_MC_HV, sse2) H264_MC_816(H264_MC_H, ssse3) @@ -499,12 +506,16 @@ QPEL16(mmxext) #endif /* HAVE_X86ASM */ -#define SET_QPEL_FUNCS(PFX, IDX, SIZE, CPU, PREFIX) \ +#define SET_QPEL_FUNCS0123(PFX, IDX, SIZE, CPU, PREFIX) \ do { \ c->PFX ## _pixels_tab[IDX][ 0] = PREFIX ## PFX ## SIZE ## _mc00_ ## CPU; \ c->PFX ## _pixels_tab[IDX][ 1] = PREFIX ## PFX ## SIZE ## _mc10_ ## CPU; \ c->PFX ## _pixels_tab[IDX][ 2] = PREFIX ## PFX ## SIZE ## _mc20_ ## CPU; \ c->PFX ## _pixels_tab[IDX][ 3] = PREFIX ## PFX ## SIZE ## _mc30_ ## CPU; \ + } while (0) +#define SET_QPEL_FUNCS(PFX, IDX, SIZE, CPU, PREFIX) \ + do { \ + SET_QPEL_FUNCS0123(PFX, IDX, SIZE, CPU, PREFIX); \ c->PFX ## _pixels_tab[IDX][ 4] = PREFIX ## PFX ## SIZE ## _mc01_ ## CPU; \ c->PFX ## _pixels_tab[IDX][ 5] = PREFIX ## PFX ## SIZE ## _mc11_ ## CPU; \ c->PFX ## _pixels_tab[IDX][ 6] = PREFIX ## PFX ## SIZE ## _mc21_ ## CPU; \ @@ -543,11 +554,16 @@ av_cold void ff_h264qpel_init_x86(H264QpelContext *c, int bit_depth) if (EXTERNAL_MMXEXT(cpu_flags)) { if (!high_bit_depth) { - SET_QPEL_FUNCS(put_h264_qpel, 0, 16, mmxext, ); - SET_QPEL_FUNCS(put_h264_qpel, 1, 8, mmxext, ); +#if ARCH_X86_32 +#define SET_MMXEXT_QPEL_FUNCS(PFX, IDX, SIZE, CPU, PREFIX) SET_QPEL_FUNCS(PFX, IDX, SIZE, CPU, PREFIX) +#else +#define SET_MMXEXT_QPEL_FUNCS(PFX, IDX, SIZE, CPU, PREFIX) SET_QPEL_FUNCS0123(PFX, IDX, SIZE, CPU, PREFIX) +#endif + SET_MMXEXT_QPEL_FUNCS(put_h264_qpel, 0, 16, mmxext, ); + SET_MMXEXT_QPEL_FUNCS(put_h264_qpel, 1, 8, mmxext, ); SET_QPEL_FUNCS(put_h264_qpel, 2, 4, mmxext, ); - SET_QPEL_FUNCS(avg_h264_qpel, 0, 16, mmxext, ); - SET_QPEL_FUNCS(avg_h264_qpel, 1, 8, mmxext, ); + SET_MMXEXT_QPEL_FUNCS(avg_h264_qpel, 0, 16, mmxext, ); + SET_MMXEXT_QPEL_FUNCS(avg_h264_qpel, 1, 8, mmxext, ); SET_QPEL_FUNCS(avg_h264_qpel, 2, 4, mmxext, ); } else if (bit_depth == 10) { #if ARCH_X86_32 diff --git a/libavcodec/x86/h264_qpel_8bit.asm b/libavcodec/x86/h264_qpel_8bit.asm index 03c7d88f8c..72e98248d8 100644 --- a/libavcodec/x86/h264_qpel_8bit.asm +++ b/libavcodec/x86/h264_qpel_8bit.asm @@ -461,9 +461,11 @@ cglobal %1_h264_qpel8or16_v_lowpass_op, 5,5,8 ; dst, src, dstStride, srcStride, REP_RET %endmacro +%if ARCH_X86_32 INIT_MMX mmxext QPEL8OR16_V_LOWPASS_OP put QPEL8OR16_V_LOWPASS_OP avg +%endif INIT_XMM sse2 QPEL8OR16_V_LOWPASS_OP put @@ -581,8 +583,10 @@ cglobal %1_h264_qpel8or16_hv1_lowpass_op, 4,4,8 ; src, tmp, srcStride, size REP_RET %endmacro +%if ARCH_X86_32 INIT_MMX mmxext QPEL8OR16_HV1_LOWPASS_OP put +%endif INIT_XMM sse2 QPEL8OR16_HV1_LOWPASS_OP put