From patchwork Fri Sep 20 03:43:41 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Zhao Zhili X-Patchwork-Id: 51675 Delivered-To: ffmpegpatchwork2@gmail.com Received: by 2002:a59:d154:0:b0:48e:c0f8:d0de with SMTP id bt20csp783964vqb; Fri, 20 Sep 2024 00:19:14 -0700 (PDT) X-Forwarded-Encrypted: i=2; AJvYcCXHWidJUKwpN6GF3hfj5Pfz88LhSBYnupcQE/Sr+zVWKDL2PhB1v7SVAVuZeFsnHcyqzhsDxqoqWxUGhwRwH1JJ@gmail.com X-Google-Smtp-Source: AGHT+IHeTOlA5JxZV7bly/xPF+usPd5eroYpLjHTiKYaVMwCQT3I0t/6X2fFrWuTDbqyObRqQsKD X-Received: by 2002:a05:6512:138b:b0:52f:c833:861a with SMTP id 2adb3069b0e04-536ac3400b5mr990463e87.51.1726816754489; Fri, 20 Sep 2024 00:19:14 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1726816754; cv=none; d=google.com; s=arc-20240605; b=guKFYTEZ2Zpsh1eYc+lTvmGnYveigW5W0Vr+v2qXS2zY5OuuzJ1E9WrgJsJiv4bg+y t94pd8dtgswLRngVCJLbGdnoz+MnOY9xwAzj81pnVjf7O37kO8/glL7m6gJDu/4heg+n TjHztLVNib1fn38ua22NGG2dv3Wg+/Erg9rtfi60CrUcDzpC7XlfO1WZHpDYh5yP++tK VQw8kXiWQ0shXZJAwdqwjXyWSHowVtQl/vKf8OfHP07Du9G9+mNXy0FbVIQPjJxXvYoh sDRAEwavZc7491npbXZymDV6yt1FspAL71P4AJZfLRTr76dHqD9hH8+AIUOtuyfFFndl za0Q== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20240605; h=sender:errors-to:content-transfer-encoding:cc:reply-to :list-subscribe:list-help:list-post:list-archive:list-unsubscribe :list-id:precedence:subject:mime-version:date:to:from:message-id :dkim-signature:delivered-to; bh=Hs0JWNh0uwQSz8loLPeTnrDVRWMPKS6sQyAwgtkIfPY=; fh=HnHYuZ9XgUo86ZRXTLWWmQxhslYEI9B9taZ5X1DLFfc=; b=dBF9A65ggd0UQPJuQkm47NH/ZnbGJkiI+pdPn5+x5cQBXw0FFxFjw7OKgVJpNs7XBL ksdTe9m7Ahql1GS7wIoeI/VyC/w1ZiVbrWXEq/yUhmBPajtvF8QyIyiby+FvdH05ZFMl qviTqPQGVCB7Iw9houxO1TDgLeuypYIpowTzXy/oH+Ra2UoZUrYLt5tyIJ0SaLnnmLBY yzUEVbeAHaei8QipWJFEAel8CoQw4p+5Ya3SL24mcRK60U/kCscinNNTamY/vwXNMx2K bDNc4ocPd3ywoiHzagwl2hTQllqPdahrW/ilG0aGySvmAEkEiuUGmH2UTaacJ/1+PImV Nq3g==; dara=google.com ARC-Authentication-Results: i=1; mx.google.com; dkim=neutral (body hash did not verify) header.i=@foxmail.com header.s=s201512 header.b=qDDpnZsv; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=foxmail.com Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id 2adb3069b0e04-5368708dc60si4453285e87.269.2024.09.20.00.19.13; Fri, 20 Sep 2024 00:19:14 -0700 (PDT) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; dkim=neutral (body hash did not verify) header.i=@foxmail.com header.s=s201512 header.b=qDDpnZsv; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=foxmail.com Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 851E168C1CD; Fri, 20 Sep 2024 06:44:02 +0300 (EEST) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from out162-62-57-252.mail.qq.com (out162-62-57-252.mail.qq.com [162.62.57.252]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id D5CFE68A9C2 for ; Fri, 20 Sep 2024 06:43:54 +0300 (EEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=foxmail.com; s=s201512; t=1726803825; bh=ajj0PBONTuKxqK9k0rbhdbXsV9L/RjfDPefNTAAzOg4=; h=From:To:Cc:Subject:Date; b=qDDpnZsvG/RLbfxyQDGVmT3bjT91osRLMPnOoIhrIxfS+REYW0mWIqdu0fDHeOiod /8kFBC9gkobVKDvHp0KKrYQ18BE02EkdsWDCAc4OgEeQxCV+lRwriyEmd/dx/44vFz 77jSmB2dJ2SxmD5F+j2AmSOzUZ2T5AIU3IKkwzkU= Received: from ZHILIZHAO-MB1.tencent.com ([157.255.208.73]) by newxmesmtplogicsvrszb9-1.qq.com (NewEsmtp) with SMTP id AEC02830; Fri, 20 Sep 2024 11:43:44 +0800 X-QQ-mid: xmsmtpt1726803824tx6fyli4s Message-ID: X-QQ-XMAILINFO: NQR8mRxMnur91gIxrQVtefRVGo5wsZwSNnUpS3LRxwpSi9L3OHnfEBjL3quKVL 3NfhjLUPkL2Ki7oOPf/O0p2Hf8zXr3zRR0zpzz/ef+thfeqMvYD9MbWR3Ue5Eg3PvacciO+qOwpl rrrvX3/+BraSzkFXbg9OiLWAYrFCPzHBIbTCZ0EODArhhREsrhYQDTqcOPg/g2raX1MA1nSoWTw1 ES2Za2mjtCkz+AiTwMwSpFUQl0ksLlRTkL+RO1s/mfKr6l/Y3EsiruQ7W8y1pk4v11JesqgjATal j4t2mdw5600O5o4pXzy56zcQWQCYBbqwx4cZBCCkErLrcGsWvQoHl3n10Z9OMyKstZoiGkJD56WF OrjrrAj7XzaZTQGBicUuvl6TPBNU9EgmMqHnQBZQoBhRNyYcaFgCOt90YS3JhBkdby1QSVUpY8Kp 54OtacW6vo88j8/GgtXEyT2aUoPiZSTAkXZhhZ2NX/o/xZmxGltxZztAO7q4Vn3jwpqDSmGCxK+u Q/e+5CclwaB+u5ZMA2kmmzENtXdBYPRL2tuxZXmkkHaQ9Ty0rM7lL+cQ+Ocvv9yKBKO/TUJAbejr IR2dj2c7Tbv1cWz7uZGFWqV0oja/m/tJDuxZIvb8F5M5UEdNhlCb9L404qxOeBf/gbcZWisPyjxK dwVzDdIONUBvy58zKZ9dpHq93mU+G/5nvTtuGdrgmjyIcwSOqE3CcccU+ZbFNAakB9WOcBxPkY/y volgbkVt5zb6M72eGqb3TKFxpT3z98G4/zXQiL4iB8JQGh1NnUA9AS8oeZK5pNGKZhTnBc6Kzgi7 8mSbMa+spKYGpa/CJNfQgGMauz71y++9rTiRCCXQt7B92iyqiZ9/aR/0cn/MitJU0nriktNfRaG/ oKnbsCugNXKINvlTxTNpP8sdEDXQC+FcGFvcJpFeBSW5WLrMLwfRdvVqevOLp3jtnUdaLmLOvdVn cfb4l1zyw04rqIlkaK48+BCiNq5vOwMngXLYcqgAYFI8n7s5gsy05s/aOAMtw1 X-QQ-XMRINFO: M/715EihBoGSf6IYSX1iLFg= From: Zhao Zhili To: ffmpeg-devel@ffmpeg.org Date: Fri, 20 Sep 2024 11:43:41 +0800 X-OQ-MSGID: <20240920034341.42694-1-quinkblack@foxmail.com> X-Mailer: git-send-email 2.42.0 MIME-Version: 1.0 Subject: [FFmpeg-devel] [PATCH v2] avcodec/vvc: Don't use large array on stack X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Cc: Zhao Zhili Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" X-TUID: rmRF7p4uz594 From: Zhao Zhili tmp_array in dmvr_hv takes 33024 bytes on stack, which can be dangerous. --- libavcodec/vvc/ctu.h | 1 + libavcodec/vvc/dsp.h | 2 +- libavcodec/vvc/inter.c | 2 +- libavcodec/vvc/inter_template.c | 12 +++++++----- libavcodec/x86/vvc/vvcdsp_init.c | 8 ++++---- tests/checkasm/vvc_mc.c | 17 +++++++++++++---- 6 files changed, 27 insertions(+), 15 deletions(-) diff --git a/libavcodec/vvc/ctu.h b/libavcodec/vvc/ctu.h index eab4612561..eb3e51c7e5 100644 --- a/libavcodec/vvc/ctu.h +++ b/libavcodec/vvc/ctu.h @@ -385,6 +385,7 @@ typedef struct VVCLocalContext { DECLARE_ALIGNED(32, uint8_t, alf_buffer_luma)[(MAX_CTU_SIZE + 2 * ALF_PADDING_SIZE) * EDGE_EMU_BUFFER_STRIDE * 2]; DECLARE_ALIGNED(32, uint8_t, alf_buffer_chroma)[(MAX_CTU_SIZE + 2 * ALF_PADDING_SIZE) * EDGE_EMU_BUFFER_STRIDE * 2]; DECLARE_ALIGNED(32, int32_t, alf_gradient_tmp)[ALF_GRADIENT_SIZE * ALF_GRADIENT_SIZE * ALF_NUM_DIR]; + DECLARE_ALIGNED(32, int16_t, dmvr_tmp)[(MAX_PB_SIZE + BILINEAR_EXTRA) * MAX_PB_SIZE]; struct { int sbt_num_fourths_tb0; ///< SbtNumFourthsTb0 diff --git a/libavcodec/vvc/dsp.h b/libavcodec/vvc/dsp.h index 635ebcafed..3594dfc5f5 100644 --- a/libavcodec/vvc/dsp.h +++ b/libavcodec/vvc/dsp.h @@ -99,7 +99,7 @@ typedef struct VVCInterDSPContext { int (*sad)(const int16_t *src0, const int16_t *src1, int dx, int dy, int block_w, int block_h); void (*dmvr[2][2])(int16_t *dst, const uint8_t *src, ptrdiff_t src_stride, int height, - intptr_t mx, intptr_t my, int width); + intptr_t mx, intptr_t my, int width, int16_t *tmp); } VVCInterDSPContext; struct VVCLocalContext; diff --git a/libavcodec/vvc/inter.c b/libavcodec/vvc/inter.c index 64a9dd1e46..48b633d580 100644 --- a/libavcodec/vvc/inter.c +++ b/libavcodec/vvc/inter.c @@ -806,7 +806,7 @@ static void dmvr_mv_refine(VVCLocalContext *lc, MvField *mvf, MvField *orig_mv, const int wrap_enabled = fc->ps.pps->r->pps_ref_wraparound_enabled_flag; MC_EMULATED_EDGE_BILINEAR(lc->edge_emu_buffer, &src, &src_stride, ox, oy); - fc->vvcdsp.inter.dmvr[!!my][!!mx](tmp[i], src, src_stride, pred_h, mx, my, pred_w); + fc->vvcdsp.inter.dmvr[!!my][!!mx](tmp[i], src, src_stride, pred_h, mx, my, pred_w, lc->dmvr_tmp); } min_sad = fc->vvcdsp.inter.sad(tmp[L0], tmp[L1], dx, dy, block_w, block_h); diff --git a/libavcodec/vvc/inter_template.c b/libavcodec/vvc/inter_template.c index c073a73e76..fad1ba801f 100644 --- a/libavcodec/vvc/inter_template.c +++ b/libavcodec/vvc/inter_template.c @@ -474,7 +474,8 @@ static void FUNC(apply_bdof)(uint8_t *_dst, const ptrdiff_t _dst_stride, const i //8.5.3.2.2 Luma sample bilinear interpolation process static void FUNC(dmvr)(int16_t *dst, const uint8_t *_src, const ptrdiff_t _src_stride, - const int height, const intptr_t mx, const intptr_t my, const int width) + const int height, const intptr_t mx, const intptr_t my, const int width, + int16_t *tmp) { #if BIT_DEPTH != 10 const pixel *src = (const pixel *)_src; @@ -502,7 +503,8 @@ static void FUNC(dmvr)(int16_t *dst, const uint8_t *_src, const ptrdiff_t _src_s //8.5.3.2.2 Luma sample bilinear interpolation process static void FUNC(dmvr_h)(int16_t *dst, const uint8_t *_src, const ptrdiff_t _src_stride, - const int height, const intptr_t mx, const intptr_t my, const int width) + const int height, const intptr_t mx, const intptr_t my, const int width, + int16_t *tmp) { const pixel *src = (const pixel*)_src; const ptrdiff_t src_stride = _src_stride / sizeof(pixel); @@ -520,7 +522,8 @@ static void FUNC(dmvr_h)(int16_t *dst, const uint8_t *_src, const ptrdiff_t _src //8.5.3.2.2 Luma sample bilinear interpolation process static void FUNC(dmvr_v)(int16_t *dst, const uint8_t *_src, const ptrdiff_t _src_stride, - const int height, const intptr_t mx, const intptr_t my, const int width) + const int height, const intptr_t mx, const intptr_t my, const int width, + int16_t *tmp) { const pixel *src = (pixel*)_src; const ptrdiff_t src_stride = _src_stride / sizeof(pixel); @@ -539,9 +542,8 @@ static void FUNC(dmvr_v)(int16_t *dst, const uint8_t *_src, const ptrdiff_t _src //8.5.3.2.2 Luma sample bilinear interpolation process static void FUNC(dmvr_hv)(int16_t *dst, const uint8_t *_src, const ptrdiff_t _src_stride, - const int height, const intptr_t mx, const intptr_t my, const int width) + const int height, const intptr_t mx, const intptr_t my, const int width, int16_t *tmp_array) { - int16_t tmp_array[(MAX_PB_SIZE + BILINEAR_EXTRA) * MAX_PB_SIZE]; int16_t *tmp = tmp_array; const pixel *src = (const pixel*)_src; const ptrdiff_t src_stride = _src_stride / sizeof(pixel); diff --git a/libavcodec/x86/vvc/vvcdsp_init.c b/libavcodec/x86/vvc/vvcdsp_init.c index f3e2e3a27b..7ff3e2bdff 100644 --- a/libavcodec/x86/vvc/vvcdsp_init.c +++ b/libavcodec/x86/vvc/vvcdsp_init.c @@ -90,13 +90,13 @@ AVG_PROTOTYPES(12, avx2) #define DMVR_PROTOTYPES(bd, opt) \ void ff_vvc_dmvr_##bd##_##opt(int16_t *dst, const uint8_t *src, ptrdiff_t src_stride, \ - int height, intptr_t mx, intptr_t my, int width); \ + int height, intptr_t mx, intptr_t my, int width, int16_t *unused); \ void ff_vvc_dmvr_h_##bd##_##opt(int16_t *dst, const uint8_t *src, ptrdiff_t src_stride, \ - int height, intptr_t mx, intptr_t my, int width); \ + int height, intptr_t mx, intptr_t my, int width, int16_t *unused); \ void ff_vvc_dmvr_v_##bd##_##opt(int16_t *dst, const uint8_t *src, ptrdiff_t src_stride, \ - int height, intptr_t mx, intptr_t my, int width); \ + int height, intptr_t mx, intptr_t my, int width, int16_t *unused); \ void ff_vvc_dmvr_hv_##bd##_##opt(int16_t *dst, const uint8_t *src, ptrdiff_t src_stride, \ - int height, intptr_t mx, intptr_t my, int width); \ + int height, intptr_t mx, intptr_t my, int width, int16_t *unused); \ DMVR_PROTOTYPES( 8, avx2) DMVR_PROTOTYPES(10, avx2) diff --git a/tests/checkasm/vvc_mc.c b/tests/checkasm/vvc_mc.c index 754cf19065..591557dfa6 100644 --- a/tests/checkasm/vvc_mc.c +++ b/tests/checkasm/vvc_mc.c @@ -333,6 +333,7 @@ static void check_avg(void) } #define SR_RANGE 2 +#define DMVR_TMP_BUF_SIZE ((MAX_PB_SIZE + BILINEAR_EXTRA) * MAX_PB_SIZE * sizeof(int16_t)) static void check_dmvr(void) { LOCAL_ALIGNED_32(uint16_t, dst0, [DST_BUF_SIZE]); @@ -340,14 +341,20 @@ static void check_dmvr(void) LOCAL_ALIGNED_32(uint8_t, src0, [SRC_BUF_SIZE]); LOCAL_ALIGNED_32(uint8_t, src1, [SRC_BUF_SIZE]); const int dst_stride = MAX_PB_SIZE * sizeof(int16_t); + int16_t *tmp0 = av_mallocz(DMVR_TMP_BUF_SIZE); + int16_t *tmp1 = av_mallocz(DMVR_TMP_BUF_SIZE); VVCDSPContext c; declare_func(void, int16_t *dst, const uint8_t *src, ptrdiff_t src_stride, int height, - intptr_t mx, intptr_t my, int width); + intptr_t mx, intptr_t my, int width, int16_t *tmp); + + if (!tmp0 || !tmp1) + fail(); for (int bit_depth = 8; bit_depth <= 12; bit_depth += 2) { ff_vvc_dsp_init(&c, bit_depth); randomize_pixels(src0, src1, SRC_BUF_SIZE); + randomize_buffers(tmp0, tmp1, DMVR_TMP_BUF_SIZE / 2, UINT32_MAX); for (int i = 0; i < 2; i++) { for (int j = 0; j < 2; j++) { for (int h = 8; h <= 16; h *= 2) { @@ -371,8 +378,8 @@ static void check_dmvr(void) if (check_func(c.inter.dmvr[j][i], "%s_%d_%dx%d", type, bit_depth, pred_w, pred_h)) { memset(dst0, 0, DST_BUF_SIZE); memset(dst1, 0, DST_BUF_SIZE); - call_ref(dst0, src0 + SRC_OFFSET, PIXEL_STRIDE, pred_h, mx, my, pred_w); - call_new(dst1, src1 + SRC_OFFSET, PIXEL_STRIDE, pred_h, mx, my, pred_w); + call_ref(dst0, src0 + SRC_OFFSET, PIXEL_STRIDE, pred_h, mx, my, pred_w, tmp0); + call_new(dst1, src1 + SRC_OFFSET, PIXEL_STRIDE, pred_h, mx, my, pred_w, tmp1); for (int k = 0; k < pred_h; k++) { if (memcmp(dst0 + k * dst_stride, dst1 + k * dst_stride, pred_w * sizeof(int16_t))) { fail(); @@ -380,13 +387,15 @@ static void check_dmvr(void) } } - bench_new(dst1, src1 + SRC_OFFSET, PIXEL_STRIDE, pred_h, mx, my, pred_w); + bench_new(dst1, src1 + SRC_OFFSET, PIXEL_STRIDE, pred_h, mx, my, pred_w, tmp1); } } } } } } + av_free(tmp0); + av_free(tmp1); report("dmvr"); }