From patchwork Thu Jun 9 23:55:04 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andreas Rheinhardt X-Patchwork-Id: 36153 Delivered-To: andriy.gelman@gmail.com Received: by 2002:a25:bb42:0:0:0:0:0 with SMTP id b2csp889194ybk; Thu, 9 Jun 2022 16:59:25 -0700 (PDT) X-Google-Smtp-Source: ABdhPJw2MzQ1iy2EM7RVo5JTvRK1HHXc53NR/Ffup9SfjCoCp3E0taodfIFyJyT840nrx0CIFduK X-Received: by 2002:a17:906:d550:b0:704:7ba6:9854 with SMTP id cr16-20020a170906d55000b007047ba69854mr37714639ejc.579.1654819165069; Thu, 09 Jun 2022 16:59:25 -0700 (PDT) Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id e16-20020a056402089000b0042dd608eb5fsi5482932edy.630.2022.06.09.16.59.24; Thu, 09 Jun 2022 16:59:25 -0700 (PDT) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; dkim=neutral (body hash did not verify) header.i=@outlook.com header.s=selector1 header.b=WFy27xPP; arc=fail (body hash mismatch); spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=QUARANTINE dis=NONE) header.from=outlook.com Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 6436E68B81D; Fri, 10 Jun 2022 02:56:49 +0300 (EEST) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from EUR04-HE1-obe.outbound.protection.outlook.com (mail-oln040092073020.outbound.protection.outlook.com [40.92.73.20]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id DC82B68B89C for ; Fri, 10 Jun 2022 02:56:41 +0300 (EEST) ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=I+Zoa7eVBs3yAkn0/ONsEGvPUCRTqVOBh8GeWgdOwwe1+sZE2yNhsI+HBnrdukmcBp9Rrgn+l5+ppr4dNzw2fM2m7QUT03abM4UPKw71FSCUiVprvAnI0w+XRGcJ+XDu1o2XXfZ9M7SzfF30LBIiaYiO67kiREEQFfsOIDkTvJ/4ye1xn7IafDwOL8ywuNXn4Odh2sw0DssjFLyT0BIDcLYVtom/RunyWKk70fSY+GitHSiA/bJdCY/o0DIti158pMhsFnSINrpxkt8jedHBoQ/ufFOvtnlWBrdsOK7JPv8Chr2BLbd0YddrOu88pLYdLuf4UexEtz1M9FdPVKvs1A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=j9IFP5ptWbe2uMdpd2j2AJvrmYcSkjqD1nxCTlSHc4g=; b=cDVRsQH7MRSB9UQNSnwQ0U9r/DL4ROnhPJbG0ZJmIMqMe+SAxu0wL/U+zQ8+n/DS2gB2aHo0DNbc+pImtotNW3aP0lHP1Hbhgn6klqmxNvYTOPY82UJnXyE45vGpi2SZGACXbUvS3wXjGO5RgB6QZuQkzeM700mNjzF8Pif3BnKwaVmsJKSDChmCh+YGtRekwtTOF+LIS0f3aaOlqdTgIlyMqFG8r05UoFpr8/zew8W1eLhnIdYWzaWT23cvicVHdPEhFjtFBqclZws1XdenszmXdjBT+GuF5xfTYH5uNBLpj5z8nW27dPjlNvr36DJqf8b79RmnPw+tWyIpFr4Xxg== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=none; dmarc=none; dkim=none; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=outlook.com; s=selector1; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=j9IFP5ptWbe2uMdpd2j2AJvrmYcSkjqD1nxCTlSHc4g=; b=WFy27xPPcUrJiWBUfyd1SfqOPzlyh5h7OppZWFyesgjZPLkCLjKmAHJp0qBEce//+wJ7zCyEI5RDc0Du6K3XmG8UVkTf5cr1RAFrCw+zw1FZVenJSyR8ifX2ba8As3gkXQzPrxr/RaAnEJ/MRdl5zaKfdaB+2f5dKBlp2iyiIlCYz4+5lgtAM6VbfgN9ZITB2Za836YkfeSl1ssvUJ5lOJtzKaPvaJtSMrGDP6445Y9zu84uZgxGW1f8fJBfnT1eitgPRPQpRuQ9WeEiZ9NdTdLNTyZGSK6r/LyWf393F0vddHaXszRAfu8ooWy9yXaBtW7YCu8C9fjBPgNJl2pPKA== Received: from DB6PR0101MB2214.eurprd01.prod.exchangelabs.com (2603:10a6:4:42::27) by DB6PR01MB3862.eurprd01.prod.exchangelabs.com (2603:10a6:6:48::31) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.5314.15; Thu, 9 Jun 2022 23:56:40 +0000 Received: from DB6PR0101MB2214.eurprd01.prod.exchangelabs.com ([fe80::60b9:9f29:40cc:f01c]) by DB6PR0101MB2214.eurprd01.prod.exchangelabs.com ([fe80::60b9:9f29:40cc:f01c%10]) with mapi id 15.20.5332.013; Thu, 9 Jun 2022 23:56:40 +0000 From: Andreas Rheinhardt To: ffmpeg-devel@ffmpeg.org Date: Fri, 10 Jun 2022 01:55:04 +0200 Message-ID: X-Mailer: git-send-email 2.34.1 In-Reply-To: References: X-TMN: [p9hnxAw1SCv8m0Xy9AcQ2UtIpfdyPbXc] X-ClientProxiedBy: AM5PR04CA0005.eurprd04.prod.outlook.com (2603:10a6:206:1::18) To DB6PR0101MB2214.eurprd01.prod.exchangelabs.com (2603:10a6:4:42::27) X-Microsoft-Original-Message-ID: <20220609235523.458689-22-andreas.rheinhardt@outlook.com> MIME-Version: 1.0 X-MS-Exchange-MessageSentRepresentingType: 1 X-MS-PublicTrafficType: Email X-MS-Office365-Filtering-Correlation-Id: 27b99c67-e174-4f8d-24a2-08da4a73b265 X-MS-Exchange-SLBlob-MailProps: S/btQ8cKWiTijo6adWu98SHbtEySq1lK5ICh3LxIb9p5Ukhtv6lASHxqGeC/yQoVhbaVEAOvE21XmiwAozB+ZdiFsUtvFPydBM+XxGTSEcaykNVdRNnbnFt0OzBDC3YB/Kbvzd6tLhH/l7xw3uM+OqH4BwAYsBIw2hBub6kYuOiWyngTn61srTNRVlNzfRxtGtG7hv80BccM8okxIUGKR9vjtQWDOSZuAjHPEIXAnPbnzwi/GFTd/U7D+5ty2IPG6YvMMnmQYeumnVikvpALl0u6ZLvRAz8qyyTAW3rERKTL8XeudYL1D8I5YDZlBqUiUuRNaArE+Cu7AmsjicTJ2div0jFT8fyo5y4p8NwnRKLMQQ6eTlWq50EUvmZLsCXRB6R957Y70MAqT0zK8LDj4iZvXTOhU5LliLhQ3fIQGE5a+mt6oc+IATEiVUsTgeypC9zTQYQ0Dj6Xmh1oLqXDUOopCpp2GzGdjUEEZzyCBPEjmaEvpppzqLBsfMdRWMHsw03MujDlew/Pckr2c/4I7aWSHdV9OJZGQbLAW8CoDogkq03gQYnIgbn33+ukUoMZoKghRs6CbdTDAJbZIMrKj1Bre4AKAFcLMTDXLcHOZi+c0CFK/sqYGCqDUoO9gQGFAKEqqIRoqifHv018F3e8srF4dHumV7k5bfqE3rxwlnQpI6bZAEPZ+/gIplSjmof/HeQ4u/j16lgpAM9HiOwUqKMTxIp3j6nBbiyRiTFiziufwzS3bEucVA5awccXeFN5yi2G9JX88f4= X-MS-TrafficTypeDiagnostic: DB6PR01MB3862:EE_ X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: DwT3MU5GT6qd6fclewbDEDIp6986pK89jnUzDlm2eY2TSqyTkK+7zI1Ng6B3K4p9nqGQIYI27PevCvkD49N+s79pSw80BniFKZScgDjYKmHtoUJ2orcQA0o+9VAKC4ZgnMib0ssU39HmWvqDwXcxbVDcjg1qvJDOuFMBlv4tZrDAzPsf94PuI0r+fFppFkqxhP8JfNe5DL4ksxxyh9C4J9sZw5FF9OyhXalK/NZLCsTPxViMdE3vI72EQPHmQ7s2uRSp8URxigDBH8y4PuEa3adTmPVm5gWVog2wrn7ZA8Z2rlHZ2H+nFp4ToqP6io3ZLfToDqFJWRhBk7wxIU9w8448D8hK8ob6mYxsHrW2029OV6ltMNy1X4SLzAm4YyIcptg4bTBxvv/kE9HiA3JJLofvRgW577UTb6OUSF4S+lhZ9PMG+54ja00O7C00uWTE9zR4N3UUa+MmsgqP74E28pulWEWOJw0+7wFR+qEqpzLDst2TKvw4U1Z7Z5aCIEBfnojfTKPsVCpU4F89rUPV7/l+UxAX6jDBBHd/waqgNCw79rIlhrHL39qhyc3yWH08/9u2zimLS19tejayrByYjw== X-MS-Exchange-AntiSpam-MessageData-ChunkCount: 1 X-MS-Exchange-AntiSpam-MessageData-0: 3xxTJlchw55ux0lg8NDXqIbTEHcQynW4V7mDStjVl6dleQv6U8i2ThTfakQz9f/WqLfnZROuYd9d3nJomFpBuincTqHe0FaNctI6JMhtgDXIIZHAcskPk3J719dElI1gWfZgaU+Ft+Lq63N7+jvqi9f480/f9v1hKK3v8bVVr+qirYx4weKwFlW3SXTQfGpE4jxSk+7g1B1CVnwIcjOEHPKB+QitIzsU4hJvHX5xs8PrD05W1ov3b9HDhvMBWMhevZYbKWtTwEbiWUiz2JuQNtkZYJ5/Z81tmduJ0pD/FtFDtI2IvF84+ttU0VyhRg9KoPntEP6HIXUCl4C1nTNX40U63tgsjKKFPwTQmCuNhNn4hDEARDWhMrXY1kKWazBrPxd04KOW8dmThB7wGdFe/LO3TIy8vxAmVb5bg6NkgTWkeF8tYjYxYmUymfy3ukox1TaBPJ46RHt4wWikPdeDJaPjoVpYzX0uKPHzss12PscNkrexlczVm2ghtu3AVO9lXw6Gfolq+GSxJe/Rzm9vxy6cmVhHJY9RDNvitWPDpoaNh7kjDcdYY9pPyhWh13GRKntMqO8tPnVA5sRkEJbIGmwki+5t1P1sSSyc9uOxn9g0NHplNKbUW69iNO2z0/PrQ+1R6BGPQIvt3dKqUJbjacAMHXswq+karDg37+5W/ozNRM47KySq4bd7+6LGgEXp8zyLpapxgpCs+54R3FZwa/Z8LENr6h0ZJTJTHCIqgflhiZIfelFqCTWrriJtdtNPDoj7Q9VS4yNcM1CUdi6YZZU8mknHpuSrvQsvocLO/X6tHAj4S8uu6kE73qDq5B1H4FepyafB9dXcKR8pwFsUv5ADPtwvTcDit11gPtpN2O8g49MVZedj1sCKb5rBKSgBV28+HIZaVLRs1K80mKA746XIMmZQNkyfaYFre0/fr7ySXk0fWi/6+h3t+LV92ruOYkxcXv8gnOIxjjay7JAXWQmvs+pZs1tYYZs7ZiKreJtTaOXANNv1b3+qGuX4E9qpo0xvm2w+oK5kcR8rBBlEsFatzCPgAdnNvF5hufha2WidkhOIDPqXe6ThngKQmQ7bNPIBisGqoCD6hRHkX8y+leQiK0A32sRHpcpndpKB67uXXvtf8f1Sc19LSEtGmWV9bgbr5+rcecfJzhe1XFGKOvDpxEAhLGimP8RVl9ADjhEkobXJ7+E/NSzXBQhzyAuS99WkaEJRfXfttohSKyxilpHdogaU/kyuHRMqBRrbxtJKHzA2a6ouhw9eWEKVX8lu8kOc+CaZJafQkElTeDVf3oEFBYOf0inGWrvuoeue2WzjgrIqikSjBFGfRjCvbxpoMx6QUm1lfXfOJ7Gbo+NvD0HieTN+PKvSvSvQ6SI0BjS098rDViM8AYkuxOuCakiWSRawQmADQ0aRrZKWak7SLKD40dKeOQ8Aqqkuo4fQBnNF1QMb6jtv1i8efM+wIJKi59j43nEjylDGl4SDqgyRvg== X-OriginatorOrg: outlook.com X-MS-Exchange-CrossTenant-Network-Message-Id: 27b99c67-e174-4f8d-24a2-08da4a73b265 X-MS-Exchange-CrossTenant-AuthSource: DB6PR0101MB2214.eurprd01.prod.exchangelabs.com X-MS-Exchange-CrossTenant-AuthAs: Internal X-MS-Exchange-CrossTenant-OriginalArrivalTime: 09 Jun 2022 23:56:40.3513 (UTC) X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted X-MS-Exchange-CrossTenant-Id: 84df9e7f-e9f6-40af-b435-aaaaaaaaaaaa X-MS-Exchange-CrossTenant-RMS-PersistedConsumerOrg: 00000000-0000-0000-0000-000000000000 X-MS-Exchange-Transport-CrossTenantHeadersStamped: DB6PR01MB3862 Subject: [FFmpeg-devel] [PATCH 22/41] avcodec/x86/h264dsp_init: Disable overridden functions on x64 X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Cc: Andreas Rheinhardt Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" X-TUID: cXoSGc8/EVg+ Content-Length: 13430 x64 always has MMX, MMXEXT, SSE and SSE2 and this means that some functions for MMX, MMXEXT, SSE and 3dnow are always overridden by other functions (unless one e.g. explicitly disables SSE2). This commit therefore disables such h264dsp functions at compile-time for x64. Signed-off-by: Andreas Rheinhardt --- libavcodec/x86/h264_deblock.asm | 24 +++----------- libavcodec/x86/h264_idct.asm | 57 +++++++-------------------------- libavcodec/x86/h264_weight.asm | 8 +++++ libavcodec/x86/h264dsp_init.c | 21 ++++++++---- 4 files changed, 38 insertions(+), 72 deletions(-) diff --git a/libavcodec/x86/h264_deblock.asm b/libavcodec/x86/h264_deblock.asm index a2e745cd8e..9e671af45c 100644 --- a/libavcodec/x86/h264_deblock.asm +++ b/libavcodec/x86/h264_deblock.asm @@ -867,7 +867,6 @@ DEBLOCK_LUMA_INTRA v %if ARCH_X86_64 == 0 INIT_MMX mmxext DEBLOCK_LUMA_INTRA v8 -%endif INIT_MMX mmxext @@ -911,17 +910,8 @@ cglobal deblock_v_chroma_8, 5,6 ; int8_t *tc0) ;----------------------------------------------------------------------------- cglobal deblock_h_chroma_8, 5,7 -%if ARCH_X86_64 - ; This could use the red zone on 64 bit unix to avoid the stack pointer - ; readjustment, but valgrind assumes the red zone is clobbered on - ; function calls and returns. - sub rsp, 16 - %define buf0 [rsp] - %define buf1 [rsp+8] -%else %define buf0 r0m %define buf1 r2m -%endif CHROMA_H_START TRANSPOSE4x8_LOAD bw, wd, dq, PASS8ROWS(t5, r0, r1, t6) movq buf0, m0 @@ -934,9 +924,6 @@ cglobal deblock_h_chroma_8, 5,7 movq m0, buf0 movq m3, buf1 TRANSPOSE8x4B_STORE PASS8ROWS(t5, r0, r1, t6) -%if ARCH_X86_64 - add rsp, 16 -%endif RET ALIGN 16 @@ -953,13 +940,8 @@ ff_chroma_inter_body_mmxext: cglobal deblock_h_chroma422_8, 5, 6 SUB rsp, (1+ARCH_X86_64*2)*mmsize - %if ARCH_X86_64 - %define buf0 [rsp+16] - %define buf1 [rsp+8] - %else - %define buf0 r0m - %define buf1 r2m - %endif + %define buf0 r0m + %define buf1 r2m movd m6, [r4] punpcklbw m6, m6 @@ -1059,6 +1041,8 @@ ff_chroma_intra_body_mmxext: paddb m2, m6 ret +%endif ; ARCH_X86_64 == 0 + %macro LOAD_8_ROWS 8 movd m0, %1 movd m1, %2 diff --git a/libavcodec/x86/h264_idct.asm b/libavcodec/x86/h264_idct.asm index c54f9f1a68..17c7af388c 100644 --- a/libavcodec/x86/h264_idct.asm +++ b/libavcodec/x86/h264_idct.asm @@ -87,12 +87,14 @@ SECTION .text STORE_DIFFx2 m2, m3, m4, m5, m7, 6, %1, %3 %endmacro +%if ARCH_X86_32 INIT_MMX mmx ; void ff_h264_idct_add_8_mmx(uint8_t *dst, int16_t *block, int stride) cglobal h264_idct_add_8, 3, 3, 0 movsxdifnidn r2, r2d IDCT4_ADD r0, r1, r2 RET +%endif %macro IDCT8_1D 2 psraw m0, m1, 1 @@ -207,6 +209,7 @@ cglobal h264_idct_add_8, 3, 3, 0 STORE_DIFFx2 m1, m2, m5, m6, m7, 6, %1, %3 %endmacro +%if ARCH_X86_32 INIT_MMX mmx ; void ff_h264_idct8_add_8_mmx(uint8_t *dst, int16_t *block, int stride) cglobal h264_idct8_add_8, 3, 4, 0 @@ -223,6 +226,7 @@ cglobal h264_idct8_add_8, 3, 4, 0 ADD rsp, pad RET +%endif ; %1=uint8_t *dst, %2=int16_t *block, %3=int stride %macro IDCT8_ADD_SSE 4 @@ -315,16 +319,7 @@ cglobal h264_idct8_add_8, 3, 4, 10 %endmacro INIT_MMX mmxext -; void ff_h264_idct_dc_add_8_mmxext(uint8_t *dst, int16_t *block, int stride) %if ARCH_X86_64 -cglobal h264_idct_dc_add_8, 3, 4, 0 - movsxd r2, r2d - movsx r3, word [r1] - mov dword [r1], 0 - DC_ADD_MMXEXT_INIT r3, r2 - DC_ADD_MMXEXT_OP movh, r0, r2, r3 - RET - ; void ff_h264_idct8_dc_add_8_mmxext(uint8_t *dst, int16_t *block, int stride) cglobal h264_idct8_dc_add_8, 3, 4, 0 movsxd r2, r2d @@ -358,6 +353,7 @@ cglobal h264_idct8_dc_add_8, 2, 3, 0 %endif INIT_MMX mmx +%if ARCH_X86_32 ; void ff_h264_idct_add16_8_mmx(uint8_t *dst, const int *block_offset, ; int16_t *block, int stride, ; const uint8_t nnzc[6 * 8]) @@ -438,16 +434,12 @@ cglobal h264_idct_add16_8, 5, 8 + npicregs, 0, dst1, block_offset, block, stride jz .no_dc mov word [r2], 0 DC_ADD_MMXEXT_INIT r6, r3 -%if ARCH_X86_64 == 0 %define dst2q r1 %define dst2d r1d -%endif mov dst2d, dword [r1+r5*4] lea dst2q, [r0+dst2q] DC_ADD_MMXEXT_OP movh, dst2q, r3, r6 -%if ARCH_X86_64 == 0 mov r1, r1m -%endif inc r5 add r2, 32 cmp r5, 16 @@ -519,16 +511,12 @@ cglobal h264_idct_add16intra_8, 5, 8 + npicregs, 0, dst1, block_offset, block, s jz .skipblock mov word [r2], 0 DC_ADD_MMXEXT_INIT r6, r3 -%if ARCH_X86_64 == 0 %define dst2q r1 %define dst2d r1d -%endif mov dst2d, dword [r1+r5*4] add dst2q, r0 DC_ADD_MMXEXT_OP movh, dst2q, r3, r6 -%if ARCH_X86_64 == 0 mov r1, r1m -%endif .skipblock: inc r5 add r2, 32 @@ -560,18 +548,14 @@ cglobal h264_idct8_add4_8, 5, 8 + npicregs, 0, dst1, block_offset, block, stride jz .no_dc mov word [r2], 0 DC_ADD_MMXEXT_INIT r6, r3 -%if ARCH_X86_64 == 0 %define dst2q r1 %define dst2d r1d -%endif mov dst2d, dword [r1+r5*4] lea dst2q, [r0+dst2q] DC_ADD_MMXEXT_OP mova, dst2q, r3, r6 lea dst2q, [dst2q+r3*4] DC_ADD_MMXEXT_OP mova, dst2q, r3, r6 -%if ARCH_X86_64 == 0 mov r1, r1m -%endif add r5, 4 add r2, 128 cmp r5, 16 @@ -597,6 +581,7 @@ cglobal h264_idct8_add4_8, 5, 8 + npicregs, 0, dst1, block_offset, block, stride ADD rsp, pad RET +%endif INIT_XMM sse2 ; void ff_h264_idct8_add4_8_sse2(uint8_t *dst, const int *block_offset, @@ -678,6 +663,7 @@ h264_idct_add8_mmx_plane: jnz .nextblock rep ret +%if ARCH_X86_32 ; void ff_h264_idct_add8_8_mmx(uint8_t **dest, const int *block_offset, ; int16_t *block, int stride, ; const uint8_t nnzc[6 * 8]) @@ -687,20 +673,14 @@ cglobal h264_idct_add8_8, 5, 8 + npicregs, 0, dst1, block_offset, block, stride, add r2, 512 %ifdef PIC lea picregq, [scan8_mem] -%endif -%if ARCH_X86_64 - mov dst2q, r0 %endif call h264_idct_add8_mmx_plane mov r5, 32 add r2, 384 -%if ARCH_X86_64 - add dst2q, gprsize -%else add r0mp, gprsize -%endif call h264_idct_add8_mmx_plane RET ; TODO: check rep ret after a function call +%endif cglobal h264_idct_add8_422_8, 5, 8 + npicregs, 0, dst1, block_offset, block, stride, nnzc, cntr, coeff, dst2, picreg ; dst1, block_offset, block, stride, nnzc, cntr, coeff, dst2, picreg @@ -734,6 +714,7 @@ cglobal h264_idct_add8_422_8, 5, 8 + npicregs, 0, dst1, block_offset, block, str RET ; TODO: check rep ret after a function call +%if ARCH_X86_32 h264_idct_add8_mmxext_plane: movsxdifnidn r3, r3d .nextblock: @@ -741,14 +722,9 @@ h264_idct_add8_mmxext_plane: movzx r6, byte [r4+r6] test r6, r6 jz .try_dc -%if ARCH_X86_64 - mov r0d, dword [r1+r5*4] - add r0, [dst2q] -%else mov r0, r1m ; XXX r1m here is actually r0m of the calling func mov r0, [r0] add r0, dword [r1+r5*4] -%endif IDCT4_ADD r0, r2, r3 inc r5 add r2, 32 @@ -761,14 +737,9 @@ h264_idct_add8_mmxext_plane: jz .skipblock mov word [r2], 0 DC_ADD_MMXEXT_INIT r6, r3 -%if ARCH_X86_64 - mov r0d, dword [r1+r5*4] - add r0, [dst2q] -%else mov r0, r1m ; XXX r1m here is actually r0m of the calling func mov r0, [r0] add r0, dword [r1+r5*4] -%endif DC_ADD_MMXEXT_OP movh, r0, r3, r6 .skipblock: inc r5 @@ -785,22 +756,16 @@ cglobal h264_idct_add8_8, 5, 8 + npicregs, 0, dst1, block_offset, block, stride, movsxdifnidn r3, r3d mov r5, 16 add r2, 512 -%if ARCH_X86_64 - mov dst2q, r0 -%endif %ifdef PIC lea picregq, [scan8_mem] %endif call h264_idct_add8_mmxext_plane mov r5, 32 add r2, 384 -%if ARCH_X86_64 - add dst2q, gprsize -%else add r0mp, gprsize -%endif call h264_idct_add8_mmxext_plane RET ; TODO: check rep ret after a function call +%endif ; r0 = uint8_t *dst, r2 = int16_t *block, r3 = int stride, r6=clobbered h264_idct_dc_add8_mmxext: @@ -1139,8 +1104,10 @@ cglobal h264_luma_dc_dequant_idct, 3, 4, %1 RET %endmacro +%if ARCH_X86_32 INIT_MMX mmx IDCT_DC_DEQUANT 0 +%endif INIT_MMX sse2 IDCT_DC_DEQUANT 7 diff --git a/libavcodec/x86/h264_weight.asm b/libavcodec/x86/h264_weight.asm index 0975d74fcf..086616e633 100644 --- a/libavcodec/x86/h264_weight.asm +++ b/libavcodec/x86/h264_weight.asm @@ -70,6 +70,7 @@ SECTION .text packuswb m0, m1 %endmacro +%if ARCH_X86_32 INIT_MMX mmxext cglobal h264_weight_16, 6, 6, 0 WEIGHT_SETUP @@ -82,6 +83,7 @@ cglobal h264_weight_16, 6, 6, 0 dec r2d jnz .nextrow REP_RET +%endif %macro WEIGHT_FUNC_MM 2 cglobal h264_weight_%1, 6, 6, %2 @@ -95,8 +97,10 @@ cglobal h264_weight_%1, 6, 6, %2 REP_RET %endmacro +%if ARCH_X86_32 INIT_MMX mmxext WEIGHT_FUNC_MM 8, 0 +%endif INIT_XMM sse2 WEIGHT_FUNC_MM 16, 8 @@ -198,6 +202,7 @@ WEIGHT_FUNC_HALF_MM 8, 8 packuswb m0, m1 %endmacro +%if ARCH_X86_32 INIT_MMX mmxext cglobal h264_biweight_16, 7, 8, 0 BIWEIGHT_SETUP @@ -216,6 +221,7 @@ cglobal h264_biweight_16, 7, 8, 0 dec r3d jnz .nextrow REP_RET +%endif %macro BIWEIGHT_FUNC_MM 2 cglobal h264_biweight_%1, 7, 8, %2 @@ -233,8 +239,10 @@ cglobal h264_biweight_%1, 7, 8, %2 REP_RET %endmacro +%if ARCH_X86_32 INIT_MMX mmxext BIWEIGHT_FUNC_MM 8, 0 +%endif INIT_XMM sse2 BIWEIGHT_FUNC_MM 16, 8 diff --git a/libavcodec/x86/h264dsp_init.c b/libavcodec/x86/h264dsp_init.c index c9a96c7dca..9ef6c6bb53 100644 --- a/libavcodec/x86/h264dsp_init.c +++ b/libavcodec/x86/h264dsp_init.c @@ -236,6 +236,10 @@ av_cold void ff_h264dsp_init_x86(H264DSPContext *c, const int bit_depth, if (bit_depth == 8) { if (EXTERNAL_MMX(cpu_flags)) { +#if ARCH_X86_32 + if (cpu_flags & AV_CPU_FLAG_CMOV) + c->h264_luma_dc_dequant_idct = ff_h264_luma_dc_dequant_idct_mmx; + c->h264_idct_dc_add = c->h264_idct_add = ff_h264_idct_add_8_mmx; c->h264_idct8_dc_add = @@ -243,18 +247,21 @@ av_cold void ff_h264dsp_init_x86(H264DSPContext *c, const int bit_depth, c->h264_idct_add16 = ff_h264_idct_add16_8_mmx; c->h264_idct8_add4 = ff_h264_idct8_add4_8_mmx; + + c->h264_idct_add16intra = ff_h264_idct_add16intra_8_mmx; +#endif if (chroma_format_idc <= 1) { +#if ARCH_X86_32 c->h264_idct_add8 = ff_h264_idct_add8_8_mmx; +#endif } else { c->h264_idct_add8 = ff_h264_idct_add8_422_8_mmx; } - c->h264_idct_add16intra = ff_h264_idct_add16intra_8_mmx; - if (cpu_flags & AV_CPU_FLAG_CMOV) - c->h264_luma_dc_dequant_idct = ff_h264_luma_dc_dequant_idct_mmx; } if (EXTERNAL_MMXEXT(cpu_flags)) { - c->h264_idct_dc_add = ff_h264_idct_dc_add_8_mmxext; c->h264_idct8_dc_add = ff_h264_idct8_dc_add_8_mmxext; +#if ARCH_X86_32 && HAVE_MMXEXT_EXTERNAL + c->h264_idct_dc_add = ff_h264_idct_dc_add_8_mmxext; c->h264_idct_add16 = ff_h264_idct_add16_8_mmxext; c->h264_idct8_add4 = ff_h264_idct8_add4_8_mmxext; if (chroma_format_idc <= 1) @@ -270,18 +277,18 @@ av_cold void ff_h264dsp_init_x86(H264DSPContext *c, const int bit_depth, c->h264_h_loop_filter_chroma = ff_deblock_h_chroma422_8_mmxext; c->h264_h_loop_filter_chroma_intra = ff_deblock_h_chroma422_intra_8_mmxext; } -#if ARCH_X86_32 && HAVE_MMXEXT_EXTERNAL c->h264_v_loop_filter_luma = deblock_v_luma_8_mmxext; c->h264_h_loop_filter_luma = ff_deblock_h_luma_8_mmxext; c->h264_v_loop_filter_luma_intra = deblock_v_luma_intra_8_mmxext; c->h264_h_loop_filter_luma_intra = ff_deblock_h_luma_intra_8_mmxext; -#endif /* ARCH_X86_32 && HAVE_MMXEXT_EXTERNAL */ + c->weight_h264_pixels_tab[0] = ff_h264_weight_16_mmxext; c->weight_h264_pixels_tab[1] = ff_h264_weight_8_mmxext; - c->weight_h264_pixels_tab[2] = ff_h264_weight_4_mmxext; c->biweight_h264_pixels_tab[0] = ff_h264_biweight_16_mmxext; c->biweight_h264_pixels_tab[1] = ff_h264_biweight_8_mmxext; +#endif /* ARCH_X86_32 && HAVE_MMXEXT_EXTERNAL */ + c->weight_h264_pixels_tab[2] = ff_h264_weight_4_mmxext; c->biweight_h264_pixels_tab[2] = ff_h264_biweight_4_mmxext; } if (EXTERNAL_SSE2(cpu_flags)) {