From patchwork Thu Jun 9 23:55:02 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andreas Rheinhardt X-Patchwork-Id: 36156 Delivered-To: andriy.gelman@gmail.com Received: by 2002:a25:bb42:0:0:0:0:0 with SMTP id b2csp888976ybk; Thu, 9 Jun 2022 16:59:00 -0700 (PDT) X-Google-Smtp-Source: ABdhPJz+6G97tODg4drFJswfaMhSDWozwDGfntjPp4uSoLZLLCfhwnf1rncaBhVCEeKrvAiGdu6Q X-Received: by 2002:a17:907:7f8d:b0:711:d8db:2714 with SMTP id qk13-20020a1709077f8d00b00711d8db2714mr18060748ejc.63.1654819140505; Thu, 09 Jun 2022 16:59:00 -0700 (PDT) Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id v17-20020a056402175100b0042839620d77si19171791edx.631.2022.06.09.16.59.00; Thu, 09 Jun 2022 16:59:00 -0700 (PDT) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; dkim=neutral (body hash did not verify) header.i=@outlook.com header.s=selector1 header.b=JbgNehbB; arc=fail (body hash mismatch); spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=QUARANTINE dis=NONE) header.from=outlook.com Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 05F9968B8BA; Fri, 10 Jun 2022 02:56:42 +0300 (EEST) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from EUR04-DB3-obe.outbound.protection.outlook.com (mail-oln040092074058.outbound.protection.outlook.com [40.92.74.58]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id E436A68B7AD for ; Fri, 10 Jun 2022 02:56:38 +0300 (EEST) ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=GAc6cco/GrFjsseCKOuWhK2pH8gM6/j/tnd6EfTwqfVbGPeVWe2eX/jsuvjJMYGVkKxvJ2iN+q7LguUnJTxzEpGisakQO1MpH78Wa/s7/USFz4KyTZ07+KqzvSgbn1cPgSm+RAx4wJGKaUGUOXcXRhMGs1FZgaTfHkKXVrD3iAxrg95zfuTATFTaeTAr9rt1I45Rokls9p7V1DZCxBjz/J1/udeHHWWTHrcZptNT6O1z0q2chZJGtZf83XmtgYBxRj3Cud6TnozpsTPq6SyOOrRX1MMoAIiaaGdjIPde9M57IhhnZCsDx+pnRgm4aCCZiTzIM70Kqr9nmxuwCbfP7w== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=lgTtglDe8rJxlnp1Xz9BNiVN13NiRcNR2hFYp9ZTBkU=; b=nMWc6zBZ+oEeliziNkRmCmheJvpE3pdS2EoqfKF1uaj6ytW413SmknQjrPZBYVwWC5T3nt3i4zZyqXztkoyu3Thkt2D9bI9A8VkncITQVVmtP7sMzjGydPHm65xrkPpfUiMbFoZVQIEQQc76LMXim+UT5YWj8e21tdTgHmBDMafp/19xg0cKqQFIqSF77oVOJIZtgrlQcsG2NH/vGsZr/k8x03GCdb+yfu2fcWWdu6sa/q4YM//xxlP3MoKnDUEwSYhoPA4byRHz62CjIqajfGXHsuwVFPXvL7j3Iv9+3hHvK+tFGP0GEg8DQfGCtO7+ciVkdbp4TPhjENnNjbiYgw== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=none; dmarc=none; dkim=none; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=outlook.com; s=selector1; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=lgTtglDe8rJxlnp1Xz9BNiVN13NiRcNR2hFYp9ZTBkU=; b=JbgNehbB9XXMCLP6I8I6csvN2MHzjSbo81IsUmGRJHxHyjm9KN3v2mNj+EoHS2I6jRNvyrwidRD+2vCn8hOUFgZQ8ir2eDw9WHnwHWZgJ2Ni1XSvM3PJVOYqzylqvzcTiPmRI4RXTfU7pcxXkIwBKA1+q68B0ZDlHZDAMRxUFpnCFSq0oXfCsOJZ9DfANOC6zRlys68UOVdTuHoVNT2MucGJHpn+cowQxdL39ut0Dh62yFYMY2JCTSv8CDflCEtmKd6YANoTTDzdDF55kB5O50s29ltcQF4+C0DDePZjs5WzW5aaB1K6vZJm5oLrgKIFfI7SKhT0ZYJVm9G/K5CPvQ== Received: from DB6PR0101MB2214.eurprd01.prod.exchangelabs.com (2603:10a6:4:42::27) by AM0PR01MB5809.eurprd01.prod.exchangelabs.com (2603:10a6:208:16f::29) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.5332.12; Thu, 9 Jun 2022 23:56:35 +0000 Received: from DB6PR0101MB2214.eurprd01.prod.exchangelabs.com ([fe80::60b9:9f29:40cc:f01c]) by DB6PR0101MB2214.eurprd01.prod.exchangelabs.com ([fe80::60b9:9f29:40cc:f01c%10]) with mapi id 15.20.5332.013; Thu, 9 Jun 2022 23:56:34 +0000 From: Andreas Rheinhardt To: ffmpeg-devel@ffmpeg.org Date: Fri, 10 Jun 2022 01:55:02 +0200 Message-ID: X-Mailer: git-send-email 2.34.1 In-Reply-To: References: X-TMN: [HqsAFVTg6KL/zPipHncAbiCf+SeK9mow] X-ClientProxiedBy: AM5PR04CA0005.eurprd04.prod.outlook.com (2603:10a6:206:1::18) To DB6PR0101MB2214.eurprd01.prod.exchangelabs.com (2603:10a6:4:42::27) X-Microsoft-Original-Message-ID: <20220609235523.458689-20-andreas.rheinhardt@outlook.com> MIME-Version: 1.0 X-MS-Exchange-MessageSentRepresentingType: 1 X-MS-PublicTrafficType: Email X-MS-Office365-Filtering-Correlation-Id: 0cb350b1-5d61-44b7-5e1a-08da4a73aef5 X-MS-Exchange-SLBlob-MailProps: S/btQ8cKWiTijo6adWu98SHbtEySq1lKzLqQpsDZYwaNtDDFsAcilTd7TM2PKhoYpZBRHoSkCGHu0hSL3QJXUFhH2PxxmNej3RMoZHvvuofsEVQXNdximPyI0XG0Oa0OjDiii6Jx/oq09z7+nrwZ02leau1naHEIsMW382ldT0FCyHxdGwwRpzVvsFrF2At7uiYHAIXSBuUCj8tMbdUpiVQqXu5H0uTxVF0aV+Xuemqv5P/6MOUWEaOqRxm95YrnMOVqPan/Gnc8ojS95Y12THYVwriX3oqDTQ38mV8NtLCLKJRuF3W3kFS5b5Rm48at72CfM/zc/srs7pX3qMLyKtOQqmbAtxSVZKP3XmeyTd6jtSgkvQLmRVdMiwjiFhdFMc2ElCEAEG5pmyfhLIs5BBMxHF1dtUoIAXdv1+VNvsELS5orz1X5MMWF2TufUYTvQrtQ1N7+HkVxrqhCES5UZBQHmOd67OlYivF5/47f1dNotnaUdmUt087CEEGpHpc9EldFiPSNBXbTWMxhGMUUv2J6VZHExdGtAPNLXpV1Gnha1svAII+28coSDXZjUNQxxTbrGme6ufCQqdCxUhdADHrRxrSFCaQfbki6mOLj4EwYiSNKOYHLxK/QL/xCD4ROvuJDSZ2xIfhtsm5nwYMoxTtgDGZi8+veJqJ5iDP/ByCn+OEos6r5OjFVRUpfspp34eeqDYNkUcj8sX0gw1LHkYOLaOm5m5ghp6rItziiEAh9rVvliN2j2TOeZVjRkWXBCLkbvCQ9jhY= X-MS-TrafficTypeDiagnostic: AM0PR01MB5809:EE_ X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: tU9NfDpOCR8BmuJVZRTE0bj4wJvhNHGRlXITagoJLGJeEZXC+jHQ3prSwRwJC1e+QaVS5WlJeYMNDHPiHP6cKHwqL4FRmqzmoOv270/PdtT5meX2RPC0gIMPgV/8ugdmWmmN3vE2Rh41z4+I2byhSTa+DVbwZoWyj07YtwaA5lx7mYN6E/sheV/Ua28ppb2BNP9iymN/Kbf3vDctcAYXhqkPM5q6zOCVWYW/WnIlCC2fXrBq4WaV5GB3zHVxF99hi2SP3/Wx/dXDwE5FVpGoT7TjjkOJr1Mk4mc1zqujBkp69fRBV0Rw15rQ1NnJQ7ohJL7Fg/MFHU7NwZEbF/C6s/8SOflTl5p78I6/Im70u08j4XOl9J2dnRXZeI8FvLhZnrbB79DOOa64JnovwK3Mz1IPvcKXAa4Kdc0KEFof63oz5MkLUFciQmX2N17H+0LneGxWfku1sO0ux2Afpr4Gol5ngVROZvx/2vp5QwrWJ+P+CHBiIrKl39CxagSpKswdE78MNhGQ4Kou8UvXzs9tadvwX1JAc7JaM6+KCOyNznvG/RSZPc6Bc16QyWCnXZbpEN+VjVRGwiACVfomFpwjeA== X-MS-Exchange-AntiSpam-MessageData-ChunkCount: 1 X-MS-Exchange-AntiSpam-MessageData-0: TsEZPW3/O5UtM1dxYND4vI9lJzDa49gTqhDcLucoE4w6kdqpLXWnJCOZrMF5bKfU6+whOZIhm4FbsqYov1EcvojF1GdQqg9Z9tELkiIS9ctkEaD1kF1eSpdsryJYrFrVTx6L+XI1EHVlhVMc+wbCXdoRpI4GzMSdDSnciZYG565tAtJn2Zpp8lYw0zgXY9TQmn7Uw5sE88+9XGUsFUxjU3dyB5lHZuqW5dkqlJrRVxp0fMAkJOEfr+gcnTvdIT0bO/QCOZO1L4JnodH7Bz2u80EQlY9U4zviTn2mnZjN1UfMFODpDWl7XjJv9SCHwZ6AVG1LRBsFF/IkpjC79EfGaSsp2HzVcLqjl3aG4EBB9R0OhBK4zypA4yuGENI6nisiQAQZ+zp5uxXPd1qJktwLm/ItDxd0HobJIbDZBlj5Uw4u9v00C4rb8FXw5XAvnwVS/aIOFrHtG1TqRCli1lciYcv/zY0lsyN2JFWud4FmxwRE7hn9wuwsS43PqPtwlgFmDyBHCJBmQmTsaWO3clqzs2of+g0uACeV+zLxckzZBt5JaMXIoi3CMze7cR7p6/Qw18xVpMSY5ZIr9E6G+j5nQx98McxIbfid8lAKSiyFv8caDvvRCz5PdmAdsilLWdhTnzJ+b+ET32DP80lGAlqEoC1QvVcmnmNVDZr0QsihoUkdg+SiIFfRkKFDMxD5L7t1zODOFbxKRdxL9oQ+dMl8Z4ojOm0HTPDaSz/ZxTypYAZXAZWDRfcHI9gwLUwq0cIBXgJktANPi2VSLwqGy/ZkxmaUITT/9eSzm87yPQIJcsdxnwCdAheU8H2tu9Z3KeDNmlfIh5iZCLq9EYvv6aNqT5/JDEDWHIFE2jdXOaPyqGtB382oEGQugwSCwMzDd4g4tgqvKSVgs6iA0nrbUELs8PDYR4VrqgOlghfQ1rUPGQZ2xfmkur3b12JDJ4LXxAVtzCj31fSW7qblBsUfVSBjxohXIlbJunWASNmRQ6eL4J+TqBUi7TdIwFjkER3BuwNtIZhz4xf011hNowMwD2ubSvfmlOlG3BNvyJ3wP58ebwZ5hIPsz1BCffci6uI0AhUdZznVds6mg940/tB/Dh1asjf4Xgqqwotxc03/8KN3WkeZmp6S/MFh4aVBP3sNBK1rqGJRDp+LL7slxlcUH/1yXEOw2fhUxk8jgmg8zJMixUh0VPNqtB7SQ5yq9xIqYxEDdi5X3zHtMaHuNzGMRioQQ8IMzLrmF5Q1iMzeEF8Xxs4i9rFeJQ2Q4RJxpZBvQtYeegR4QKhYLo1n08OgF6KQGl3H0bgnTRpw2dFk1NzGnQddWbZubFxqP6xlH18QJiRH9iHlj/iQeaAidOTHHjjDmg7UoSC9vahNsaQ74WE5COW4iMOrOBF86HM0a50Yw7pK7yC8udO0JhG7ZOm6L+TrTB/NQ6LCE9JPHUP1XH1NuqkItZRy3GPxHBcf4Fuzxl94/gMWkcARHmKh4dQC1BC+pg== X-OriginatorOrg: outlook.com X-MS-Exchange-CrossTenant-Network-Message-Id: 0cb350b1-5d61-44b7-5e1a-08da4a73aef5 X-MS-Exchange-CrossTenant-AuthSource: DB6PR0101MB2214.eurprd01.prod.exchangelabs.com X-MS-Exchange-CrossTenant-AuthAs: Internal X-MS-Exchange-CrossTenant-OriginalArrivalTime: 09 Jun 2022 23:56:34.5700 (UTC) X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted X-MS-Exchange-CrossTenant-Id: 84df9e7f-e9f6-40af-b435-aaaaaaaaaaaa X-MS-Exchange-CrossTenant-RMS-PersistedConsumerOrg: 00000000-0000-0000-0000-000000000000 X-MS-Exchange-Transport-CrossTenantHeadersStamped: AM0PR01MB5809 Subject: [FFmpeg-devel] [PATCH 20/41] avcodec/x86/me_cmp: Disable overridden functions on x64 X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Cc: Andreas Rheinhardt Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" X-TUID: h4wwZ2rI9hAN Content-Length: 7683 x64 always has MMX, MMXEXT, SSE and SSE2 and this means that some functions for MMX, MMXEXT, SSE and 3dnow are always overridden by other functions (unless one e.g. explicitly disables SSE2). This commit therefore disables such me_cmp functions at compile-time for x64. Signed-off-by: Andreas Rheinhardt --- libavcodec/x86/me_cmp.asm | 6 ++++ libavcodec/x86/me_cmp_init.c | 61 +++++++++++++++++++++--------------- 2 files changed, 42 insertions(+), 25 deletions(-) diff --git a/libavcodec/x86/me_cmp.asm b/libavcodec/x86/me_cmp.asm index ad06d485ab..05e521cb08 100644 --- a/libavcodec/x86/me_cmp.asm +++ b/libavcodec/x86/me_cmp.asm @@ -261,11 +261,15 @@ hadamard8_16_wrapper 0, 14 %endif %endmacro +%if ARCH_X86_32 INIT_MMX mmx HADAMARD8_DIFF +%endif +%if ARCH_X86_32 || HAVE_ALIGNED_STACK == 0 INIT_MMX mmxext HADAMARD8_DIFF +%endif INIT_XMM sse2 %if ARCH_X86_64 @@ -385,10 +389,12 @@ cglobal sum_abs_dctelem, 1, 1, %1, block RET %endmacro +%if ARCH_X86_32 INIT_MMX mmx SUM_ABS_DCTELEM 0, 4 INIT_MMX mmxext SUM_ABS_DCTELEM 0, 4 +%endif INIT_XMM sse2 SUM_ABS_DCTELEM 7, 2 INIT_XMM ssse3 diff --git a/libavcodec/x86/me_cmp_init.c b/libavcodec/x86/me_cmp_init.c index 9af911bb88..6144bb9496 100644 --- a/libavcodec/x86/me_cmp_init.c +++ b/libavcodec/x86/me_cmp_init.c @@ -126,6 +126,7 @@ static int nsse8_mmx(MpegEncContext *c, uint8_t *pix1, uint8_t *pix2, #if HAVE_INLINE_ASM +#if ARCH_X86_32 static int vsad_intra16_mmx(MpegEncContext *v, uint8_t *pix, uint8_t *dummy, ptrdiff_t stride, int h) { @@ -270,6 +271,7 @@ static int vsad16_mmx(MpegEncContext *v, uint8_t *pix1, uint8_t *pix2, return tmp & 0x7FFF; } #undef SUM +#endif DECLARE_ASM_CONST(8, uint64_t, round_tab)[3] = { 0x0000000000000000ULL, @@ -478,20 +480,6 @@ static int sad8_y2_ ## suf(MpegEncContext *v, uint8_t *blk2, \ return sum_ ## suf(); \ } \ \ -static int sad8_xy2_ ## suf(MpegEncContext *v, uint8_t *blk2, \ - uint8_t *blk1, ptrdiff_t stride, int h) \ -{ \ - av_assert2(h == 8); \ - __asm__ volatile ( \ - "pxor %%mm7, %%mm7 \n\t" \ - "pxor %%mm6, %%mm6 \n\t" \ - ::); \ - \ - sad8_4_ ## suf(blk1, blk2, stride, 8); \ - \ - return sum_ ## suf(); \ -} \ - \ static int sad16_ ## suf(MpegEncContext *v, uint8_t *blk2, \ uint8_t *blk1, ptrdiff_t stride, int h) \ { \ @@ -535,7 +523,8 @@ static int sad16_y2_ ## suf(MpegEncContext *v, uint8_t *blk2, \ \ return sum_ ## suf(); \ } \ - \ + +#define PIX_SADXY(suf) \ static int sad16_xy2_ ## suf(MpegEncContext *v, uint8_t *blk2, \ uint8_t *blk1, ptrdiff_t stride, int h) \ { \ @@ -549,8 +538,25 @@ static int sad16_xy2_ ## suf(MpegEncContext *v, uint8_t *blk2, \ \ return sum_ ## suf(); \ } \ + \ +static int sad8_xy2_ ## suf(MpegEncContext *v, uint8_t *blk2, \ + uint8_t *blk1, ptrdiff_t stride, int h) \ +{ \ + av_assert2(h == 8); \ + __asm__ volatile ( \ + "pxor %%mm7, %%mm7 \n\t" \ + "pxor %%mm6, %%mm6 \n\t" \ + ::); \ + \ + sad8_4_ ## suf(blk1, blk2, stride, 8); \ + \ + return sum_ ## suf(); \ +} \ +#if ARCH_X86_32 PIX_SAD(mmx) +#endif +PIX_SADXY(mmx) #endif /* HAVE_INLINE_ASM */ @@ -560,32 +566,35 @@ av_cold void ff_me_cmp_init_x86(MECmpContext *c, AVCodecContext *avctx) #if HAVE_INLINE_ASM if (INLINE_MMX(cpu_flags)) { +#if ARCH_X86_32 + c->sad[0] = sad16_mmx; + c->sad[1] = sad8_mmx; + + if (!(avctx->flags & AV_CODEC_FLAG_BITEXACT)) { + c->vsad[0] = vsad16_mmx; + } + c->vsad[4] = vsad_intra16_mmx; + c->pix_abs[0][0] = sad16_mmx; c->pix_abs[0][1] = sad16_x2_mmx; c->pix_abs[0][2] = sad16_y2_mmx; - c->pix_abs[0][3] = sad16_xy2_mmx; c->pix_abs[1][0] = sad8_mmx; c->pix_abs[1][1] = sad8_x2_mmx; c->pix_abs[1][2] = sad8_y2_mmx; +#endif + c->pix_abs[0][3] = sad16_xy2_mmx; c->pix_abs[1][3] = sad8_xy2_mmx; - - c->sad[0] = sad16_mmx; - c->sad[1] = sad8_mmx; - - c->vsad[4] = vsad_intra16_mmx; - - if (!(avctx->flags & AV_CODEC_FLAG_BITEXACT)) { - c->vsad[0] = vsad16_mmx; - } } #endif /* HAVE_INLINE_ASM */ if (EXTERNAL_MMX(cpu_flags)) { +#if ARCH_X86_32 c->hadamard8_diff[0] = ff_hadamard8_diff16_mmx; c->hadamard8_diff[1] = ff_hadamard8_diff_mmx; c->sum_abs_dctelem = ff_sum_abs_dctelem_mmx; c->sse[0] = ff_sse16_mmx; +#endif c->sse[1] = ff_sse8_mmx; #if HAVE_X86ASM c->nsse[0] = nsse16_mmx; @@ -594,9 +603,11 @@ av_cold void ff_me_cmp_init_x86(MECmpContext *c, AVCodecContext *avctx) } if (EXTERNAL_MMXEXT(cpu_flags)) { +#if ARCH_X86_32 || !HAVE_ALIGNED_STACK c->hadamard8_diff[0] = ff_hadamard8_diff16_mmxext; c->hadamard8_diff[1] = ff_hadamard8_diff_mmxext; c->sum_abs_dctelem = ff_sum_abs_dctelem_mmxext; +#endif c->sad[0] = ff_sad16_mmxext; c->sad[1] = ff_sad8_mmxext;