From patchwork Wed Jan 24 12:06:34 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "J. Dekker" X-Patchwork-Id: 45791 Delivered-To: ffmpegpatchwork2@gmail.com Received: by 2002:a05:6a20:120f:b0:199:de12:6fa6 with SMTP id v15csp1225375pzf; Wed, 24 Jan 2024 04:06:50 -0800 (PST) X-Google-Smtp-Source: AGHT+IFWiOi9y4RJW2wngAzy/58tcp17fgFSq9zVMAflnaDqJbClji5UYcyGa+adqpTFKRgNmWc4 X-Received: by 2002:aa7:d4c9:0:b0:55c:2324:c94 with SMTP id t9-20020aa7d4c9000000b0055c23240c94mr939770edr.143.1706098010359; Wed, 24 Jan 2024 04:06:50 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1706098010; cv=none; d=google.com; s=arc-20160816; b=kOVWvF5UeptzDtksZEgWGSeyMcvY0w0X1eHS2uz8yFf+9Gi7TO5OojkBQJ83j4e4lJ b6KM8JbRjxkXqguFdnjov6SCbEYPBKgjTRA4iGi8nCwrzbGOPzOcsYXAWYAEnx0t8Tax lroJFkO1ME0a95kggxDTI0D5Kf1qknLKXqnb7C9KEdFQpk5RDI+kv/02WqN8EfEyU/l7 ZXvL01VxcFKBcvmoiLUyQqfRfMmvf8pH1zG0D+1QqTPs39d2/2Mz853x5Yk5LjqxE3Gx N88nIlFAk2sv5VqyogaXB+zkdIkqc8Oci4R4CxxDOjfjI2JeTgiwt7SIsjRo0jRba5g2 +t2Q== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:content-transfer-encoding:reply-to:list-subscribe :list-help:list-post:list-archive:list-unsubscribe:list-id :precedence:subject:mime-version:message-id:date:to:from:feedback-id :dkim-signature:dkim-signature:delivered-to; bh=RTpPITJZLDlHm+PpHypbvvLwFYebDcLyOgb8g3LfWK0=; fh=YOA8vD9MJZuwZ71F/05pj6KdCjf6jQRmzLS+CATXUQk=; b=XFlrthfEbpKj11sUiez71u8mpdykckQxvbp2eo5XVv4d/KTgZbCVH6RHRiDoO6yW+c eTHYXgW08rQpg1oSFj26Q3fAtPLOrYu1PZYwgbAWtw+LpS5wF0dzRXLOmPWcVDiQyUr1 5+WPdnlAJwFdkRyJCopHQe6BHgGrl+01jF7d5BnlkOHbmZcZfqCxUP2v/wQ3GUeLfczg Nl0j05G1kgY565OJViJJrzQaGBq+Xx4CsIG5mZ+Afiadkpc1nLAeOwdlaRAkHYG1ew9O /epztU/UfBJ8/A/LUGA4+CgZFjQtCO18WAvxHfLpIh3lW8gVlVyJiztogRUXR959kSa8 mS2w== ARC-Authentication-Results: i=1; mx.google.com; dkim=neutral (body hash did not verify) header.i=@itanimul.li header.s=fm1 header.b="oTg4CK/6"; dkim=neutral (body hash did not verify) header.i=@messagingengine.com header.s=fm3 header.b=pjw4VYej; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id ev7-20020a056402540700b0055872d17e56si13649717edb.42.2024.01.24.04.06.49; Wed, 24 Jan 2024 04:06:50 -0800 (PST) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; dkim=neutral (body hash did not verify) header.i=@itanimul.li header.s=fm1 header.b="oTg4CK/6"; dkim=neutral (body hash did not verify) header.i=@messagingengine.com header.s=fm3 header.b=pjw4VYej; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id BE45B68B22E; Wed, 24 Jan 2024 14:06:45 +0200 (EET) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from out2-smtp.messagingengine.com (out2-smtp.messagingengine.com [66.111.4.26]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id 531AB68CB3A for ; Wed, 24 Jan 2024 14:06:39 +0200 (EET) Received: from compute2.internal (compute2.nyi.internal [10.202.2.46]) by mailout.nyi.internal (Postfix) with ESMTP id CDFA45C0059 for ; Wed, 24 Jan 2024 07:06:37 -0500 (EST) Received: from mailfrontend1 ([10.202.2.162]) by compute2.internal (MEProxy); Wed, 24 Jan 2024 07:06:37 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=itanimul.li; h= cc:content-transfer-encoding:content-type:date:date:from:from :in-reply-to:message-id:mime-version:reply-to:subject:subject:to :to; s=fm1; t=1706097997; x=1706184397; bh=iZt5HMypeosbNe3udJl7z 4LOGrxtxaD3fXKcQiFnXiU=; b=oTg4CK/6DQSmDKYChqPKV734gUolId+d9zcfX EGFDDshZqikloong/hcim6JllTH+7GEXC++TxBo/DMVl/G/VLmrI10plBLPIh5nl K0+fzG3dLsI0E57iIUhfuz6gYPZ5TrNTvkGAEBFxvakpbwfaLVZCZFUx+Dh1bF5z vtwvXj9KYaHeBZi/IdA/eiGR7jrb0ef5XPfxe/O8axAePgu4tXAtxb4ktFDJiswV emuc82iWBx9Bjksdr7aH6ze5AgbSmMPMvaa7k9sjYZXuNvTUI+wCAEYjPq7HYyUP 5JPKPQ0LvWMi2B7RgUn9X39+0o12LZ/FMMKWprAw2hUogq3Ug== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:content-transfer-encoding:content-type :date:date:feedback-id:feedback-id:from:from:in-reply-to :message-id:mime-version:reply-to:subject:subject:to:to :x-me-proxy:x-me-proxy:x-me-sender:x-me-sender:x-sasl-enc; s= fm3; t=1706097997; x=1706184397; bh=iZt5HMypeosbNe3udJl7z4LOGrxt xaD3fXKcQiFnXiU=; b=pjw4VYej2AzuKKDe7o7JSojcd6Mo8HzMWfImPhWiCEuj q6AiW2Be/ZoKZHFFq6jLt5w8oz3uAYuSPVTVL2GZwyX1kbRjKe7GqFx49zS+PH1k DDt1SmiLYx9xlGZD0oLiQdw1JlX4NmE8/xIlLg14MCSIPwBZkPeeUqOwX3740SE7 SoXQa9Cr2c+1x3u6A2bPW20rb4BPz6qbXGv9EVuWxdNfRHaYvEGreqy77Jn7vzr3 m9fVyO6UOsKl5ltlcpdsKNmqKIMYqHC36CTOyDjeRcZQuZHtP8SekGQnwE4BeNVr 9QGBWku1MQ6J9o8voH2Y72jgImnkACTpQfCiaH4Vfg== X-ME-Sender: X-ME-Received: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgedvkedrvdeluddgfeegucetufdoteggodetrfdotf fvucfrrhhofhhilhgvmecuhfgrshhtofgrihhlpdfqfgfvpdfurfetoffkrfgpnffqhgen uceurghilhhouhhtmecufedttdenucenucfjughrpefhvffufffkofgggfestdekredtre dttdenucfhrhhomhepfdflrdcuffgvkhhkvghrfdcuoehjuggvkhesihhtrghnihhmuhhl rdhliheqnecuggftrfgrthhtvghrnhepueetgfdtuedvjeejjedvteelffeuhedtfeetud fglefhjeeukeetvddvtdevieeinecuvehluhhsthgvrhfuihiivgeptdenucfrrghrrghm pehmrghilhhfrhhomhepjhguvghksehithgrnhhimhhulhdrlhhi X-ME-Proxy: Feedback-ID: i84994747:Fastmail Received: by mail.messagingengine.com (Postfix) with ESMTPA for ; Wed, 24 Jan 2024 07:06:36 -0500 (EST) From: "J. Dekker" To: ffmpeg-devel@ffmpeg.org Date: Wed, 24 Jan 2024 13:06:34 +0100 Message-ID: <20240124120634.84237-1-jdek@itanimul.li> X-Mailer: git-send-email 2.43.0 MIME-Version: 1.0 Subject: [FFmpeg-devel] [PATCH v2] checkasm/hevc_deblock: add luma and chroma full X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" X-TUID: P2+DjOLvCg93 Signed-off-by: J. Dekker --- tests/checkasm/hevc_deblock.c | 225 +++++++++++++++++++++++++++++----- 1 file changed, 195 insertions(+), 30 deletions(-) - added luma 10/12 bit - supporting full (*_c) luma & chroma functions - dynamically generating all test data Appears to work for me. Testing on x86, hits the filtering decisions correctly. x86 doesn't have the full asm functions though, need to check a platform which has them (though the difference is minor, not sure why it wouldn't work). diff --git a/tests/checkasm/hevc_deblock.c b/tests/checkasm/hevc_deblock.c index 66fc8d5646..dfe7fc8e97 100644 --- a/tests/checkasm/hevc_deblock.c +++ b/tests/checkasm/hevc_deblock.c @@ -19,6 +19,7 @@ #include #include "libavutil/intreadwrite.h" +#include "libavutil/macros.h" #include "libavutil/mem_internal.h" #include "libavcodec/avcodec.h" @@ -29,10 +30,11 @@ static const uint32_t pixel_mask[3] = { 0xffffffff, 0x03ff03ff, 0x0fff0fff }; #define SIZEOF_PIXEL ((bit_depth + 7) / 8) -#define BUF_STRIDE (8 * 2) -#define BUF_LINES (8) -#define BUF_OFFSET (BUF_STRIDE * BUF_LINES) -#define BUF_SIZE (BUF_STRIDE * BUF_LINES + BUF_OFFSET * 2) +#define BUF_STRIDE (16 * 2) +#define BUF_LINES (16) +// large buffer sizes based on high bit depth +#define BUF_OFFSET (2 * BUF_STRIDE * BUF_LINES) +#define BUF_SIZE (2 * BUF_STRIDE * BUF_LINES + BUF_OFFSET * 2) #define randomize_buffers(buf0, buf1, size) \ do { \ @@ -45,57 +47,220 @@ static const uint32_t pixel_mask[3] = { 0xffffffff, 0x03ff03ff, 0x0fff0fff }; } \ } while (0) -static void check_deblock_chroma(HEVCDSPContext *h, int bit_depth) +static void check_deblock_chroma(HEVCDSPContext *h, int bit_depth, int c) { - int32_t tc[2] = { 0, 0 }; + // see tctable[] in hevc_filter.c, we check full range + int32_t tc[2] = { rnd() % 25, rnd() % 25 }; // no_p, no_q can only be { 0,0 } for the simpler assembly (non *_c // variant) functions, see deblocking_filter_CTB() in hevc_filter.c - uint8_t no_p[2] = { 0, 0 }; - uint8_t no_q[2] = { 0, 0 }; + uint8_t no_p[2] = { rnd() & c, rnd() & c }; + uint8_t no_q[2] = { rnd() & c, rnd() & c }; LOCAL_ALIGNED_32(uint8_t, buf0, [BUF_SIZE]); LOCAL_ALIGNED_32(uint8_t, buf1, [BUF_SIZE]); declare_func(void, uint8_t *pix, ptrdiff_t stride, int32_t *tc, uint8_t *no_p, uint8_t *no_q); - if (check_func(h->hevc_h_loop_filter_chroma, "hevc_h_loop_filter_chroma%d", bit_depth)) { - for (int i = 0; i < 4; i++) { - randomize_buffers(buf0, buf1, BUF_SIZE); - // see betatable[] in hevc_filter.c - tc[0] = (rnd() & 63) + (rnd() & 1); - tc[1] = (rnd() & 63) + (rnd() & 1); + if (check_func(c ? h->hevc_h_loop_filter_chroma_c : + h->hevc_h_loop_filter_chroma, "hevc_h_loop_filter_chroma%d%s", bit_depth, c ? "_full" : "")) { + randomize_buffers(buf0, buf1, BUF_SIZE); - call_ref(buf0 + BUF_OFFSET, BUF_STRIDE, tc, no_p, no_q); - call_new(buf1 + BUF_OFFSET, BUF_STRIDE, tc, no_p, no_q); + call_ref(buf0 + BUF_OFFSET, BUF_STRIDE, tc, no_p, no_q); + call_new(buf1 + BUF_OFFSET, BUF_STRIDE, tc, no_p, no_q); + if (memcmp(buf0, buf1, BUF_SIZE)) + fail(); + bench_new(buf1 + BUF_OFFSET, BUF_STRIDE, tc, no_p, no_q); + } + + if (check_func(c ? h->hevc_v_loop_filter_chroma_c : + h->hevc_v_loop_filter_chroma, "hevc_v_loop_filter_chroma%d%s", bit_depth, c ? "_full" : "")) { + randomize_buffers(buf0, buf1, BUF_SIZE); + + call_ref(buf0 + BUF_OFFSET, BUF_STRIDE, tc, no_p, no_q); + call_new(buf1 + BUF_OFFSET, BUF_STRIDE, tc, no_p, no_q); + if (memcmp(buf0, buf1, BUF_SIZE)) + fail(); + bench_new(buf1 + BUF_OFFSET, BUF_STRIDE, tc, no_p, no_q); + } +} + +#define P3 buf[-4 * xstride] +#define P2 buf[-3 * xstride] +#define P1 buf[-2 * xstride] +#define P0 buf[-1 * xstride] +#define Q0 buf[0 * xstride] +#define Q1 buf[1 * xstride] +#define Q2 buf[2 * xstride] +#define Q3 buf[3 * xstride] + +#define EQU(x, y) do { \ + uint16_t z = (uint16_t)(y & ((1 << (bit_depth)) - 1)); \ + if (SIZEOF_PIXEL == 1) { \ + *(uint8_t*)(&x) = (uint8_t)z; \ + } else if (SIZEOF_PIXEL == 2) { \ + *(uint16_t*)(&x) = z; \ + } \ +} while(0) + +#define RNDDIFF(val, diff) av_clip(((SIZEOF_PIXEL == 1) ? \ + *(uint8_t*)(&val) : *(uint16_t*)(&val)) - (diff), 0, \ + (1 << (bit_depth)) - 1) + rnd() % FFMAX(2 * (diff), 1) + +#define TC25(x) ((tc[x] * 5 + 1) >> 1); + +static void randomize_luma_buffers(int type, int *beta, int32_t tc[2], uint8_t *buf, ptrdiff_t xstride, ptrdiff_t ystride, int bit_depth) +{ + int i, j, b3, tc25, tc25diff, b3diff; + // minimum useful value is 1, full range 0-24 + tc[0] = ((rnd() % 25) + 1) << (bit_depth - 8); + tc[1] = ((rnd() % 25) + 1) << (bit_depth - 8); + // minimum useful value for 8bit is 8 since >> 8 + *beta = ((rnd() % 57) + 8) << (bit_depth - 8); + + switch (type) { + case 0: // strong + for (j = 0; j < 2; j++) { + tc25 = TC25(j); + tc25diff = FFMAX(tc25 - 1, 0); + // 4 lines per tc + for (i = 0; i < 4; i++) { + b3 = *beta >> 3; + + EQU(P0, rnd() % (1 << bit_depth)); + EQU(Q0, RNDDIFF(P0, tc25diff)); + + // p3 - p0 up to beta3 budget + b3diff = rnd() % b3; + EQU(P3, RNDDIFF(P0, b3diff)); + // q3 - q0, reduced budget + b3diff = rnd() % FFMAX(b3 - b3diff, 1); + EQU(Q3, RNDDIFF(Q0, b3diff)); + + // same concept, budget across 4 pixels + b3 -= b3diff = rnd() % FFMAX(b3, 1); + EQU(P2, RNDDIFF(P0, b3diff)); + b3 -= b3diff = rnd() % FFMAX(b3, 1); + EQU(Q2, RNDDIFF(Q0, b3diff)); + + // extra reduced budget for weighted pixels + b3 -= b3diff = rnd() % FFMAX(b3 - (1 << (bit_depth - 8)), 1); + EQU(P1, RNDDIFF(P0, b3diff)); + b3 -= b3diff = rnd() % FFMAX(b3 - (1 << (bit_depth - 8)), 1); + EQU(Q1, RNDDIFF(Q0, b3diff)); + + buf += ystride; + } + } + break; + case 1: // weak + for (j = 0; j < 2; j++) { + tc25 = TC25(j); + tc25diff = FFMAX(tc25 - 1, 0); + // 4 lines per tc + for (i = 0; i < 4; i++) { + // Weak filtering is signficantly simpler to activate as + // we only need to satisfy d0 + d3 < beta, which + // can be simplified to d0 + d0 < beta. Using the above + // derivations but substiuting b3 for b1 and ensuring + // that P0/Q0 are at least 1/2 tc25diff apart (tending + // towards 1/2 range). + b3 = *beta >> 1; + + EQU(P0, rnd() % (1 << bit_depth)); + EQU(Q0, RNDDIFF(P0, tc25diff >> 1) + + (tc25diff >> 1) * (P0 < (1 << (bit_depth - 1))) ? 1 : -1); + + // p3 - p0 up to beta3 budget + b3diff = rnd() % b3; + EQU(P3, RNDDIFF(P0, b3diff)); + // q3 - q0, reduced budget + b3diff = rnd() % FFMAX(b3 - b3diff, 1); + EQU(Q3, RNDDIFF(Q0, b3diff)); + + // same concept, budget across 4 pixels + b3 -= b3diff = rnd() % FFMAX(b3, 1); + EQU(P2, RNDDIFF(P0, b3diff)); + b3 -= b3diff = rnd() % FFMAX(b3, 1); + EQU(Q2, RNDDIFF(Q0, b3diff)); + + // extra reduced budget for weighted pixels + b3 -= b3diff = rnd() % FFMAX(b3 - (1 << (bit_depth - 8)), 1); + EQU(P1, RNDDIFF(P0, b3diff)); + b3 -= b3diff = rnd() % FFMAX(b3 - (1 << (bit_depth - 8)), 1); + EQU(Q1, RNDDIFF(Q0, b3diff)); + + buf += ystride; + } + } + break; + case 2: // none + *beta = 0; // ensure skip + for (i = 0; i < 8; i++) { + // we can just fill with completely random data, nothing should be touched. + EQU(P3, rnd()); EQU(P2, rnd()); EQU(P1, rnd()); EQU(P0, rnd()); + EQU(Q0, rnd()); EQU(Q1, rnd()); EQU(Q2, rnd()); EQU(Q3, rnd()); + buf += ystride; + } + break; + } +} + +static void check_deblock_luma(HEVCDSPContext *h, int bit_depth, int c) +{ + const char *type; + const char *types[3] = { "strong", "weak", "skip" }; + int beta; + int32_t tc[2] = {0}; + uint8_t no_p[2] = { rnd() & c, rnd() & c }; + uint8_t no_q[2] = { rnd() & c, rnd() & c }; + LOCAL_ALIGNED_32(uint8_t, buf0, [BUF_SIZE]); + LOCAL_ALIGNED_32(uint8_t, buf1, [BUF_SIZE]); + + declare_func(void, uint8_t *pix, ptrdiff_t stride, int beta, int32_t *tc, uint8_t *no_p, uint8_t *no_q); + + for (int j = 0; j < 3; j++) { + type = types[j]; + if (check_func(c ? h->hevc_h_loop_filter_luma_c : + h->hevc_h_loop_filter_luma, "hevc_h_loop_filter_luma%d_%s%s", bit_depth, type, c ? "_full" : "")) { + randomize_luma_buffers(j, &beta, tc, buf0 + BUF_OFFSET, 16 * SIZEOF_PIXEL, SIZEOF_PIXEL, bit_depth); + memcpy(buf1, buf0, BUF_SIZE); + + call_ref(buf0 + BUF_OFFSET, 16 * SIZEOF_PIXEL, beta, tc, no_p, no_q); + call_new(buf1 + BUF_OFFSET, 16 * SIZEOF_PIXEL, beta, tc, no_p, no_q); if (memcmp(buf0, buf1, BUF_SIZE)) fail(); + bench_new(buf1 + BUF_OFFSET, 16 * SIZEOF_PIXEL, beta, tc, no_p, no_q); } - bench_new(buf1 + BUF_OFFSET, BUF_STRIDE, tc, no_p, no_q); - } - if (check_func(h->hevc_v_loop_filter_chroma, "hevc_v_loop_filter_chroma%d", bit_depth)) { - for (int i = 0; i < 4; i++) { - randomize_buffers(buf0, buf1, BUF_SIZE); - // see betatable[] in hevc_filter.c - tc[0] = (rnd() & 63) + (rnd() & 1); - tc[1] = (rnd() & 63) + (rnd() & 1); + if (check_func(c ? h->hevc_v_loop_filter_luma_c : + h->hevc_v_loop_filter_luma, "hevc_v_loop_filter_luma%d_%s%s", bit_depth, type, c ? "_full" : "")) { + randomize_luma_buffers(j, &beta, tc, buf0 + BUF_OFFSET, SIZEOF_PIXEL, 16 * SIZEOF_PIXEL, bit_depth); + memcpy(buf1, buf0, BUF_SIZE); - call_ref(buf0 + BUF_OFFSET, BUF_STRIDE, tc, no_p, no_q); - call_new(buf1 + BUF_OFFSET, BUF_STRIDE, tc, no_p, no_q); + call_ref(buf0 + BUF_OFFSET, 16 * SIZEOF_PIXEL, beta, tc, no_p, no_q); + call_new(buf1 + BUF_OFFSET, 16 * SIZEOF_PIXEL, beta, tc, no_p, no_q); if (memcmp(buf0, buf1, BUF_SIZE)) fail(); + bench_new(buf1 + BUF_OFFSET, 16 * SIZEOF_PIXEL, beta, tc, no_p, no_q); } - bench_new(buf1 + BUF_OFFSET, BUF_STRIDE, tc, no_p, no_q); } } void checkasm_check_hevc_deblock(void) { + HEVCDSPContext h; int bit_depth; - for (bit_depth = 8; bit_depth <= 12; bit_depth += 2) { - HEVCDSPContext h; ff_hevc_dsp_init(&h, bit_depth); - check_deblock_chroma(&h, bit_depth); + check_deblock_chroma(&h, bit_depth, 0); + // check _c variants (non-simplified asm which allows skipping p/q) + check_deblock_chroma(&h, bit_depth, 1); } report("chroma"); + for (bit_depth = 8; bit_depth <= 12; bit_depth += 2) { + ff_hevc_dsp_init(&h, bit_depth); + check_deblock_luma(&h, bit_depth, 0); + // as above + check_deblock_luma(&h, bit_depth, 1); + } + report("luma"); }