From patchwork Sun Feb 25 08:27:55 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "J. Dekker" X-Patchwork-Id: 46515 Delivered-To: ffmpegpatchwork2@gmail.com Received: by 2002:a05:6a20:c51b:b0:19e:cdac:8cce with SMTP id gm27csp781574pzb; Sun, 25 Feb 2024 00:28:23 -0800 (PST) X-Forwarded-Encrypted: i=2; AJvYcCXRs/2ckYGGFhpzmJSVtuAJs14omqqO2RIVWdyjZt1PdsPk51Gv14GqGFxDYDzVEmq7aT+3QKIH3TIoQ6fxrNSmEkw5Eys26x1FtQ== X-Google-Smtp-Source: AGHT+IG2hkO4VPDW1qzTdfSL9hAEDM1gVOAI+ToUL092RNN9Xll/Zxt/d2diJHLIvClWgKxVUIEk X-Received: by 2002:aa7:d151:0:b0:565:9e15:ad82 with SMTP id r17-20020aa7d151000000b005659e15ad82mr2329860edo.0.1708849703232; Sun, 25 Feb 2024 00:28:23 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1708849703; cv=none; d=google.com; s=arc-20160816; b=cOu2T6yuHh4rfxMdZaF98px2EGn3BJ0FoCtCHTLauYNuAj7PUROuz9nqOztHZ9P0lW WIDL3qp6Rju21azVw9es8TfUgkKxraYXmthISx2aGtt7drfScc8sftCOoX/6FfCC3mM/ pCCwoYA6MEi5KJglGM4cgQtTQ6HwTWhmNWOjlmlUU0+VfvCEcizWy/Gi62+LLnpgDY0p ON65TESUeDWIx90SgLYJUIssEzmCxphF+XaR97r8fnU/D3du92VJl/4axJb2LHBB4r3R t3732q7S/me0O8+Bg2BN6usGe0hbZhbrEd1gUd3pAmsSTkXNzcgPciKrzobavBSUAldq H72g== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:content-transfer-encoding:reply-to:list-subscribe :list-help:list-post:list-archive:list-unsubscribe:list-id :precedence:subject:mime-version:message-id:date:to:from:feedback-id :dkim-signature:dkim-signature:delivered-to; bh=ak35aP2YFPnSwcMQmg7KQX3nWKApXn0iF4zCnGd8qBw=; fh=YOA8vD9MJZuwZ71F/05pj6KdCjf6jQRmzLS+CATXUQk=; b=jHgICudj0x3v9MNuqW0XBIQpsyPrKR7N8We2L5Ic7o9DOoIQd230GNYMoxMdaVill0 eAylMN1oKZbjC5p/re5tS84nwB6JSi+E5yF3167RHOgXCj5wfX9CjEC6bDtqFwC1rFYc QRM2WzFk49/qUq/EI/avZam8NIrFzQBQEGeraESa04WrMYoijgRDjubcU4jJIn3Rc9Bo NKKINq7rNxr7xVipbu1VF9ff10triPI5UJWr/Y52xVyKtaDJqdqWDwZwN4IRFv7Ys8j3 oWlX3keJRXSSN6JK2KyPUSyKvIK3cWEVElaILvF+yol8YQ/PyuE85gzoq/GpNr/dMf5m a3NQ==; dara=google.com ARC-Authentication-Results: i=1; mx.google.com; dkim=neutral (body hash did not verify) header.i=@itanimul.li header.s=fm2 header.b=cD9Nctd2; dkim=neutral (body hash did not verify) header.i=@messagingengine.com header.s=fm1 header.b=Q6NfXtzC; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id 26-20020a50875a000000b00564902f5ebcsi1053608edv.184.2024.02.25.00.28.22; Sun, 25 Feb 2024 00:28:23 -0800 (PST) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; dkim=neutral (body hash did not verify) header.i=@itanimul.li header.s=fm2 header.b=cD9Nctd2; dkim=neutral (body hash did not verify) header.i=@messagingengine.com header.s=fm1 header.b=Q6NfXtzC; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 7170668C74E; Sun, 25 Feb 2024 10:28:06 +0200 (EET) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from fhigh5-smtp.messagingengine.com (fhigh5-smtp.messagingengine.com [103.168.172.156]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id 35F7768C64C for ; Sun, 25 Feb 2024 10:28:00 +0200 (EET) Received: from compute1.internal (compute1.nyi.internal [10.202.2.41]) by mailfhigh.nyi.internal (Postfix) with ESMTP id 8C9BF114009A for ; Sun, 25 Feb 2024 03:27:58 -0500 (EST) Received: from mailfrontend1 ([10.202.2.162]) by compute1.internal (MEProxy); Sun, 25 Feb 2024 03:27:58 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=itanimul.li; h= cc:content-transfer-encoding:content-type:date:date:from:from :in-reply-to:message-id:mime-version:reply-to:subject:subject:to :to; s=fm2; t=1708849678; x=1708936078; bh=vOjaUFsSjzZWG3dwARupT 9AAkuTXOAmCUoqByrJFBho=; b=cD9Nctd2z8/y52wTiyTcrwe9pFvz6s2hcUNiP p2UGJV/fLqgDdzQQ80NNOtJyiyIhCOcYijHUfPTwXYK291MEUpsbXX/5LqiFJ+cS UE/FnJsoSlw8OIuzKtco6QBBLzJEZHl6TSo0Vh+wmwJOhvZHyv6ajd5XVmyuXXld GqfaXg2PkyW85l1zXkt5tHGGOQY8toA+PEGvnRJIUU2SloB51wJ16hFYEkzTNBrB /ykn/yZJF/xEnhXmsFQXkBeUSqo0Rz5SderkL+Cj2wbgKeo66VDt/5xsaEnDVzsp nP7vgdSmph/EIl38/clH7VZ7U004U9PePEFlEzzF0aFMZ6s4Q== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:content-transfer-encoding:content-type :date:date:feedback-id:feedback-id:from:from:in-reply-to :message-id:mime-version:reply-to:subject:subject:to:to :x-me-proxy:x-me-proxy:x-me-sender:x-me-sender:x-sasl-enc; s= fm1; t=1708849678; x=1708936078; bh=vOjaUFsSjzZWG3dwARupT9AAkuTX OAmCUoqByrJFBho=; b=Q6NfXtzCKlnWK/+1hX59B+0H0SCwuezqW4fBJa+qI/LL 0ke4UFuRPA1pyO0XuhknMs1ISVXU26eF0pjWAK1BZrbdSHCnfn52JfJxjihJQ2tu AZ8SX17dytM5Y66xNwfCHHxFiFPjQ00cwLpISpH9CPtpEQ4qdeZSqIXJz2Bn0vRx DX6e3uTj7LnAc7ZlQOg9jNLY2QOMPmx6dhKW49lvbxQhGUL0/erfLTBt/+6El7Kz MD0RFDoWD+ElJOooNvSCg02uzFDSG/PH1vm6kLZvW0KRbikEXUmzxKN3NNnk2QTq x4MESQ2NTSGsupfQAyTgYAg/qmQPo28IDSTAkDHdmA== X-ME-Sender: X-ME-Received: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgedvledrfeelgdduudekucetufdoteggodetrfdotf fvucfrrhhofhhilhgvmecuhfgrshhtofgrihhlpdfqfgfvpdfurfetoffkrfgpnffqhgen uceurghilhhouhhtmecufedttdenucenucfjughrpefhvffufffkofgggfestdekredtre dttdenucfhrhhomhepfdflrdcuffgvkhhkvghrfdcuoehjuggvkhesihhtrghnihhmuhhl rdhliheqnecuggftrfgrthhtvghrnhepueetgfdtuedvjeejjedvteelffeuhedtfeetud fglefhjeeukeetvddvtdevieeinecuvehluhhsthgvrhfuihiivgeptdenucfrrghrrghm pehmrghilhhfrhhomhepjhguvghksehithgrnhhimhhulhdrlhhi X-ME-Proxy: Feedback-ID: i84994747:Fastmail Received: by mail.messagingengine.com (Postfix) with ESMTPA for ; Sun, 25 Feb 2024 03:27:57 -0500 (EST) From: "J. Dekker" To: ffmpeg-devel@ffmpeg.org Date: Sun, 25 Feb 2024 09:27:55 +0100 Message-ID: <20240225082755.355295-1-jdek@itanimul.li> X-Mailer: git-send-email 2.43.2 MIME-Version: 1.0 Subject: [FFmpeg-devel] [PATCH] avcodec/x86/hevc: fix luma 12b overflow X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" X-TUID: vZwGOOJE1SQ3 Weak filter can overflow in delta0 calculation before >> 4 in int16. Signed-off-by: J. Dekker --- I do not know x86 simd at all, so this is just an attempt to fix the implementation rather than write extremely performant code. Suggestions welcome. libavcodec/x86/hevc_deblock.asm | 47 +++++++++++++++++++++++++++++++++ libavcodec/x86/hevcdsp_init.c | 8 ------ 2 files changed, 47 insertions(+), 8 deletions(-) diff --git a/libavcodec/x86/hevc_deblock.asm b/libavcodec/x86/hevc_deblock.asm index 85ee4800bb..ce9221ebc7 100644 --- a/libavcodec/x86/hevc_deblock.asm +++ b/libavcodec/x86/hevc_deblock.asm @@ -541,6 +541,7 @@ ALIGN 16 add betaq, r13 shr betaq, 3; ((beta + (beta >> 1)) >> 3)) +%if %1 < 12 mova m13, [pw_8] psubw m12, m4, m3 ; q0 - p0 psllw m10, m12, 3; 8 * (q0 - p0) @@ -553,7 +554,49 @@ ALIGN 16 paddw m12, m13; + 8 psraw m12, 4; >> 4 , delta0 PABSW m13, m12; abs(delta0) +%else + psubw m12, m4, m3 ; q0 - p0 + pmovsxwd m13, m12 ; m13 low + movhlps m12, m12 + pmovsxwd m12, m12 ; m12 high + + ; m8 low, m10 high + pslld m8, m13, 3; 8 * (q0 - p0) + pslld m10, m12, 3 + + paddd m8, m13 ; 9 * (q0 - p0) + paddd m10, m12 + + psubw m12, m5, m2 ; q1 - p1 + pmovsxwd m13, m12 ; m13 low + movhlps m12, m12 + pmovsxwd m12, m12 ; m12 high + psubd m8, m13 ; 9 * (q0 - p0) - ( q1 - p1 ) + psubd m10, m12 + + pslld m13, m13, 1; 2 * ( q1 - p1 ) + pslld m12, m12, 1 + + psubd m8, m13; 9 * (q0 - p0) - 3 * ( q1 - p1 ) + psubd m10, m12 + + mova m13, [pw_8] + pmovsxwd m13, m13 + + paddd m8, m13 ; + 8 + paddd m10, m13 + + psrad m8, 4; >> 4 , delta0 + psrad m10, 4 + + packssdw m12, m8 + packssdw m10, m10 + + psrldq m12, 8 + punpcklqdq m12, m10 + PABSW m13, m12; abs(delta0) +%endif psllw m10, m9, 2; 8 * tc paddw m10, m9; 10 * tc @@ -746,6 +789,7 @@ cglobal hevc_v_loop_filter_luma_10, 4, 14, 16, pix, stride, beta, tc, pix0, src3 .bypassluma: RET +%if cpuflag(avx) cglobal hevc_v_loop_filter_luma_12, 4, 14, 16, pix, stride, beta, tc, pix0, src3stride sub pixq, 8 lea pix0q, [3 * strideq] @@ -757,6 +801,7 @@ cglobal hevc_v_loop_filter_luma_12, 4, 14, 16, pix, stride, beta, tc, pix0, src3 TRANSPOSE8x8W_STORE PASS8ROWS(src3strideq, pixq, r1, pix0q), [pw_pixel_max_12] .bypassluma: RET +%endif ;----------------------------------------------------------------------------- ; void ff_hevc_h_loop_filter_luma(uint8_t *_pix, ptrdiff_t _stride, int beta, @@ -829,6 +874,7 @@ cglobal hevc_h_loop_filter_luma_10, 4, 14, 16, pix, stride, beta, tc, pix0, src3 .bypassluma: RET +%if cpuflag(avx) cglobal hevc_h_loop_filter_luma_12, 4, 14, 16, pix, stride, beta, tc, pix0, src3stride lea src3strideq, [3 * strideq] mov pix0q, pixq @@ -859,6 +905,7 @@ cglobal hevc_h_loop_filter_luma_12, 4, 14, 16, pix, stride, beta, tc, pix0, src3 movdqu [pixq + 2 * strideq], m6; q2 .bypassluma: RET +%endif %endmacro diff --git a/libavcodec/x86/hevcdsp_init.c b/libavcodec/x86/hevcdsp_init.c index f5bc342cd5..e3fcb7b591 100644 --- a/libavcodec/x86/hevcdsp_init.c +++ b/libavcodec/x86/hevcdsp_init.c @@ -1205,10 +1205,6 @@ void ff_hevc_dsp_init_x86(HEVCDSPContext *c, const int bit_depth) if (EXTERNAL_SSE2(cpu_flags)) { c->hevc_v_loop_filter_chroma = ff_hevc_v_loop_filter_chroma_12_sse2; c->hevc_h_loop_filter_chroma = ff_hevc_h_loop_filter_chroma_12_sse2; - if (ARCH_X86_64) { - c->hevc_v_loop_filter_luma = ff_hevc_v_loop_filter_luma_12_sse2; - c->hevc_h_loop_filter_luma = ff_hevc_h_loop_filter_luma_12_sse2; - } SAO_BAND_INIT(12, sse2); SAO_EDGE_INIT(12, sse2); @@ -1216,10 +1212,6 @@ void ff_hevc_dsp_init_x86(HEVCDSPContext *c, const int bit_depth) c->idct_dc[2] = ff_hevc_idct_16x16_dc_12_sse2; c->idct_dc[3] = ff_hevc_idct_32x32_dc_12_sse2; } - if (EXTERNAL_SSSE3(cpu_flags) && ARCH_X86_64) { - c->hevc_v_loop_filter_luma = ff_hevc_v_loop_filter_luma_12_ssse3; - c->hevc_h_loop_filter_luma = ff_hevc_h_loop_filter_luma_12_ssse3; - } if (EXTERNAL_SSE4(cpu_flags) && ARCH_X86_64) { EPEL_LINKS(c->put_hevc_epel, 0, 0, pel_pixels, 12, sse4); EPEL_LINKS(c->put_hevc_epel, 0, 1, epel_h, 12, sse4);